Draft: Use a container with flexible key type as cache for quadrature rules

mentioned in merge request !173 (closed)

changed the description

This makes reading the code very complicated

This does not hold anymore. !197 (merged) cleaned up the code quite a lot and now it is very easy to follow.

The alternative std::unordered_map would require a hash function.

Nope. Because rule(...) returns a reference, the container needs to be stable under insertion. std::map is indeed the correct choice.

This allows to omit more complicated locking mechanisms and should be thread-safe by definition.

But that's the whole purpose of std::call_once as far as I understand. Performance wise, this MR is changing three compare and swap (CAS) operations \mathcal{O}(1) to \mathcal{O}(log(n)) comparisons in an sparse container. Also, the current implementation is more cache friendly because multiple threads will share the same non-mutable quadratures in the hot-path (assume many non-mutable calls of this function) whereas in this MR each thread will have its own copy of the same thing in (probably) the same cache. My guess is that this would perform worse, but as always, one would have to measure.

Side note: Another form of doing a call once is when using the result of an static lambda. In theory that should produce the exact same binary as std::call_once, but I don't understand why the std::call_once is longer and more complicated both in gcc and clang (click to expand):

void init() {}

int main() {
    [[maybe_unused]] static bool t = [](){
    return init(), true;
   }();
}

init():                               # @init()
        ret
main:                                   # @main
        push    rax
        movzx   eax, byte ptr [rip + guard variable for main::t]
        test    al, al
        je      .LBB1_1
.LBB1_3:
        xor     eax, eax
        pop     rcx
        ret
.LBB1_1:
        lea     rdi, [rip + guard variable for main::t]
        call    __cxa_guard_acquire@PLT
        test    eax, eax
        je      .LBB1_3
        lea     rdi, [rip + guard variable for main::t]
        call    __cxa_guard_release@PLT
        xor     eax, eax
        pop     rcx
        ret

#include <mutex>
std::once_flag flag;
void init() {}

int main() {
    std::call_once(flag, init);
}

init():                               # @init()
        ret
main:                                   # @main
        push    r14
        push    rbx
        push    rax
        lea     rax, [rip + init()]
        mov     qword ptr [rsp], rax
        mov     r14, qword ptr [rip + std::__once_callable@GOTTPOFF]
        mov     rax, rsp
        mov     qword ptr fs:[r14], rax
        lea     rax, [rip + std::once_flag::_Prepare_execution::_Prepare_execution<std::call_once<void (&)()>(std::once_flag&, void (&)())::{lambda()#1}>(void (&)())::{lambda()#1}::__invoke()]
        mov     rbx, qword ptr [rip + std::__once_call@GOTTPOFF]
        mov     qword ptr fs:[rbx], rax
        cmp     qword ptr [rip + __pthread_key_create@GOTPCREL], 0
        je      .LBB1_1
        lea     rdi, [rip + flag]
        mov     rsi, qword ptr [rip + __once_proxy@GOTPCREL]
        call    pthread_once@PLT
        test    eax, eax
        jne     .LBB1_4
        mov     qword ptr fs:[r14], 0
        mov     qword ptr fs:[rbx], 0
        xor     eax, eax
        add     rsp, 8
        pop     rbx
        pop     r14
        ret
.LBB1_1:
        mov     eax, -1
.LBB1_4:
        mov     edi, eax
        call    std::__throw_system_error(int)@PLT
        mov     qword ptr fs:[r14], 0
        mov     qword ptr fs:[rbx], 0
        mov     rdi, rax
        call    _Unwind_Resume@PLT
flag:
        .zero   4

DW.ref.__gxx_personality_v0:
        .quad   __gxx_personality_v0

Note, this MR is super old and not up-to-date anymore.

The goal was to choose quadrature rules based on more criteria than currently possible. We could add a longer chain of call_once functions, but the code is already quiet unreadable and really difficult to understand.

marked this merge request as draft

mentioned in merge request dune-common!1274 (closed)

Draft: Use a container with flexible key type as cache for quadrature rules

Summary

Discussion

Activity

Draft: Use a container with flexible key type as cache for quadrature rules

Summary

Discussion

Merge request reports

Activity