Skip to content
Snippets Groups Projects

Draft: Use a container with flexible key type as cache for quadrature rules

Open Simon Praetorius requested to merge feature/quadrature-cache into master
1 unresolved thread

Summary

For a thread-safe cache of quadrature rules, currently a vector of vectors of vectors with several std::call_once flags is used. This makes reading the code very complicated and does not necessarily lead to any better performance than the proposed solution in this MR. Replace the nested vector with a simple std::map where the key allows to identify all quadrature rules and additionally makes it possible to extend quadrature rules properties in the future.

Discussion

  1. Instead of a std::map container, any other associative container could be used. I have chose a simple map since comparison of three integers by < is very simple to implement. The alternative std::unordered_map would require a hash function. While this is possible, it requires more code. I do not expect a measurable performance difference.

  2. The cache is implemented as thread_local. This allows to omit more complicated locking mechanisms and should be thread-safe by definition. It is, however, unclear to me, whether this has negative performance consequences it threads are closed and reopened multiple times, i.e., if this makes it necessary to re-"compute"/re-fill the cache or whether this preserves the previously cached values. An alternative implementation uses the static keyword:

// Container to store the cached values
static std::map<QuadratureKey, QuadratureRule> quadCache;

// mutex used to access the data in the container, necessary since
// access emplace is read-write.
using mutex_type = std::shared_timed_mutex;
static mutex_type access_mutex;

// define the quadrature key to be used to access the quadrature rule
const QuadratureKey key{t.id(),p,qt};

// first try to lock for read-only, if a quadratur rule for key is found, return it,
// if not, obtain a unique_lock to insert a new rule.
std::shared_lock<mutex_type> read_lock(access_mutex);
auto it = quadCache.find(key);
if (it != quadCache.end())
  return it->second;
else {
  read_lock.unlock();
  QuadratureRule rule = QuadratureRuleFactory<ctype,dim>::rule(t,p,qt);
  std::unique_lock<mutex_type> write_lock(access_mutex);
  auto new_it = quadCache.emplace(key, std::move(rule));
  return new_it.first->second;
}
Edited by Simon Praetorius

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Simon Praetorius mentioned in merge request !173 (closed)

    mentioned in merge request !173 (closed)

  • Simon Praetorius changed the description

    changed the description

    • This makes reading the code very complicated

      This does not hold anymore. !197 (merged) cleaned up the code quite a lot and now it is very easy to follow.

      The alternative std::unordered_map would require a hash function.

      Nope. Because rule(...) returns a reference, the container needs to be stable under insertion. std::map is indeed the correct choice.

      This allows to omit more complicated locking mechanisms and should be thread-safe by definition.

      But that's the whole purpose of std::call_once as far as I understand. Performance wise, this MR is changing three compare and swap (CAS) operations \mathcal{O}(1) to \mathcal{O}(log(n)) comparisons in an sparse container. Also, the current implementation is more cache friendly because multiple threads will share the same non-mutable quadratures in the hot-path (assume many non-mutable calls of this function) whereas in this MR each thread will have its own copy of the same thing in (probably) the same cache. My guess is that this would perform worse, but as always, one would have to measure.


      Side note: Another form of doing a call once is when using the result of an static lambda. In theory that should produce the exact same binary as std::call_once, but I don't understand why the std::call_once is longer and more complicated both in gcc and clang :thinking: (click to expand):

      void init() {}
      
      int main() {
          [[maybe_unused]] static bool t = [](){
          return init(), true;
         }();
      }
      init():                               # @init()
              ret
      main:                                   # @main
              push    rax
              movzx   eax, byte ptr [rip + guard variable for main::t]
              test    al, al
              je      .LBB1_1
      .LBB1_3:
              xor     eax, eax
              pop     rcx
              ret
      .LBB1_1:
              lea     rdi, [rip + guard variable for main::t]
              call    __cxa_guard_acquire@PLT
              test    eax, eax
              je      .LBB1_3
              lea     rdi, [rip + guard variable for main::t]
              call    __cxa_guard_release@PLT
              xor     eax, eax
              pop     rcx
              ret
      #include <mutex>
      std::once_flag flag;
      void init() {}
      
      int main() {
          std::call_once(flag, init);
      }
      init():                               # @init()
              ret
      main:                                   # @main
              push    r14
              push    rbx
              push    rax
              lea     rax, [rip + init()]
              mov     qword ptr [rsp], rax
              mov     r14, qword ptr [rip + std::__once_callable@GOTTPOFF]
              mov     rax, rsp
              mov     qword ptr fs:[r14], rax
              lea     rax, [rip + std::once_flag::_Prepare_execution::_Prepare_execution<std::call_once<void (&)()>(std::once_flag&, void (&)())::{lambda()#1}>(void (&)())::{lambda()#1}::__invoke()]
              mov     rbx, qword ptr [rip + std::__once_call@GOTTPOFF]
              mov     qword ptr fs:[rbx], rax
              cmp     qword ptr [rip + __pthread_key_create@GOTPCREL], 0
              je      .LBB1_1
              lea     rdi, [rip + flag]
              mov     rsi, qword ptr [rip + __once_proxy@GOTPCREL]
              call    pthread_once@PLT
              test    eax, eax
              jne     .LBB1_4
              mov     qword ptr fs:[r14], 0
              mov     qword ptr fs:[rbx], 0
              xor     eax, eax
              add     rsp, 8
              pop     rbx
              pop     r14
              ret
      .LBB1_1:
              mov     eax, -1
      .LBB1_4:
              mov     edi, eax
              call    std::__throw_system_error(int)@PLT
              mov     qword ptr fs:[r14], 0
              mov     qword ptr fs:[rbx], 0
              mov     rdi, rax
              call    _Unwind_Resume@PLT
      flag:
              .zero   4
      
      DW.ref.__gxx_personality_v0:
              .quad   __gxx_personality_v0
    • Note, this MR is super old and not up-to-date anymore.

      The goal was to choose quadrature rules based on more criteria than currently possible. We could add a longer chain of call_once functions, but the code is already quiet unreadable and really difficult to understand.

    • Please register or sign in to reply
  • Simon Praetorius marked this merge request as draft

    marked this merge request as draft

Please register or sign in to reply
Loading