[parallel] Add experimental support for thread parallel assembly
This adds experimental support for thread parallel assembly.
As discussed this opt-in feature which is implemented
by an additional template and constructor argument for Assembler
for passing an Executor. If this is not specified a SequentialExecutor
is used that mimics the old behavior.
For parallel execution this adds the following experimental stuff (originating in dune-fufem)
to Dune::Assembler::Experimental:
- A simple advancing front graph coloring algorithm. This is purely algebraic and independent on any grid structure.
- Utilities for computing the adjacency of the element graph in a grid view, coloring the element graph, creating a colored version of the element range.
- A simplified implementation of
std::barrierfrom C++20 since gcc-10 does not provide it. - A
ColoredRangeExecutorthat provides thread-parallel execution of algorithms using a colored range based onstd::thread.
Furthermore this demonstrates the usage in the poisson-pq2 and poisson-pq2-eigen
examples.
Since this leads to a change in indentation in the Assembler methods,
the diff seems to be quite large. The actual change is just a few
lines that forward the methods body and element loop to the executor. At least
locally this can be seen using git diff -w.
There is, however, one significant change: In order to guarantee thread-locality of data, the executor captures the local assembler by value and thus makes a copy. In principle we agreed that this behavior is intended. But one may argue, that this should be avoided in the sequential case. But this would make the executor usage more invasive.