Draft: Use a container with flexible key type as cache for quadrature rules
Summary
For a thread-safe cache of quadrature rules, currently a vector of vectors of vectors with several std::call_once
flags is used. This makes reading the code very complicated and does not necessarily lead to any better performance than the proposed solution in this MR. Replace the nested vector with a simple std::map
where the key allows to identify all quadrature rules and additionally makes it possible to extend quadrature rules properties in the future.
Discussion
-
Instead of a
std::map
container, any other associative container could be used. I have chose a simple map since comparison of three integers by<
is very simple to implement. The alternativestd::unordered_map
would require a hash function. While this is possible, it requires more code. I do not expect a measurable performance difference. -
The cache is implemented as
thread_local
. This allows to omit more complicated locking mechanisms and should be thread-safe by definition. It is, however, unclear to me, whether this has negative performance consequences it threads are closed and reopened multiple times, i.e., if this makes it necessary to re-"compute"/re-fill the cache or whether this preserves the previously cached values. An alternative implementation uses thestatic
keyword:
// Container to store the cached values
static std::map<QuadratureKey, QuadratureRule> quadCache;
// mutex used to access the data in the container, necessary since
// access emplace is read-write.
using mutex_type = std::shared_timed_mutex;
static mutex_type access_mutex;
// define the quadrature key to be used to access the quadrature rule
const QuadratureKey key{t.id(),p,qt};
// first try to lock for read-only, if a quadratur rule for key is found, return it,
// if not, obtain a unique_lock to insert a new rule.
std::shared_lock<mutex_type> read_lock(access_mutex);
auto it = quadCache.find(key);
if (it != quadCache.end())
return it->second;
else {
read_lock.unlock();
QuadratureRule rule = QuadratureRuleFactory<ctype,dim>::rule(t,p,qt);
std::unique_lock<mutex_type> write_lock(access_mutex);
auto new_it = quadCache.emplace(key, std::move(rule));
return new_it.first->second;
}
Merge request reports
Activity
mentioned in merge request !173 (closed)
This makes reading the code very complicated
This does not hold anymore. !197 (merged) cleaned up the code quite a lot and now it is very easy to follow.
The alternative
std::unordered_map
would require a hash function.Nope. Because
rule(...)
returns a reference, the container needs to be stable under insertion.std::map
is indeed the correct choice.This allows to omit more complicated locking mechanisms and should be thread-safe by definition.
But that's the whole purpose of
std::call_once
as far as I understand. Performance wise, this MR is changing three compare and swap (CAS) operations\mathcal{O}(1)
to\mathcal{O}(log(n))
comparisons in an sparse container. Also, the current implementation is more cache friendly because multiple threads will share the same non-mutable quadratures in the hot-path (assume many non-mutable calls of this function) whereas in this MR each thread will have its own copy of the same thing in (probably) the same cache. My guess is that this would perform worse, but as always, one would have to measure.
Side note: Another form of doing a call once is when using the result of an static lambda. In theory that should produce the exact same binary as
std::call_once
, but I don't understand why thestd::call_once
is longer and more complicated both in gcc and clang (click to expand):void init() {} int main() { [[maybe_unused]] static bool t = [](){ return init(), true; }(); }
init(): # @init() ret main: # @main push rax movzx eax, byte ptr [rip + guard variable for main::t] test al, al je .LBB1_1 .LBB1_3: xor eax, eax pop rcx ret .LBB1_1: lea rdi, [rip + guard variable for main::t] call __cxa_guard_acquire@PLT test eax, eax je .LBB1_3 lea rdi, [rip + guard variable for main::t] call __cxa_guard_release@PLT xor eax, eax pop rcx ret
#include <mutex> std::once_flag flag; void init() {} int main() { std::call_once(flag, init); }
init(): # @init() ret main: # @main push r14 push rbx push rax lea rax, [rip + init()] mov qword ptr [rsp], rax mov r14, qword ptr [rip + std::__once_callable@GOTTPOFF] mov rax, rsp mov qword ptr fs:[r14], rax lea rax, [rip + std::once_flag::_Prepare_execution::_Prepare_execution<std::call_once<void (&)()>(std::once_flag&, void (&)())::{lambda()#1}>(void (&)())::{lambda()#1}::__invoke()] mov rbx, qword ptr [rip + std::__once_call@GOTTPOFF] mov qword ptr fs:[rbx], rax cmp qword ptr [rip + __pthread_key_create@GOTPCREL], 0 je .LBB1_1 lea rdi, [rip + flag] mov rsi, qword ptr [rip + __once_proxy@GOTPCREL] call pthread_once@PLT test eax, eax jne .LBB1_4 mov qword ptr fs:[r14], 0 mov qword ptr fs:[rbx], 0 xor eax, eax add rsp, 8 pop rbx pop r14 ret .LBB1_1: mov eax, -1 .LBB1_4: mov edi, eax call std::__throw_system_error(int)@PLT mov qword ptr fs:[r14], 0 mov qword ptr fs:[rbx], 0 mov rdi, rax call _Unwind_Resume@PLT flag: .zero 4 DW.ref.__gxx_personality_v0: .quad __gxx_personality_v0
mentioned in merge request dune-common!1274 (closed)