[parallel] Add experimental support for thread parallel assembly

This adds experimental support for thread parallel assembly. As discussed this opt-in feature which is implemented by an additional template and constructor argument for Assembler for passing an Executor. If this is not specified a SequentialExecutor is used that mimics the old behavior.

For parallel execution this adds the following experimental stuff (originating in dune-fufem) to Dune::Assembler::Experimental:

  • A simple advancing front graph coloring algorithm. This is purely algebraic and independent on any grid structure.
  • Utilities for computing the adjacency of the element graph in a grid view, coloring the element graph, creating a colored version of the element range.
  • A simplified implementation of std::barrier from C++20 since gcc-10 does not provide it.
  • A ColoredRangeExecutor that provides thread-parallel execution of algorithms using a colored range based on std::thread.

Furthermore this demonstrates the usage in the poisson-pq2 and poisson-pq2-eigen examples.

Since this leads to a change in indentation in the Assembler methods, the diff seems to be quite large. The actual change is just a few lines that forward the methods body and element loop to the executor. At least locally this can be seen using git diff -w.

There is, however, one significant change: In order to guarantee thread-locality of data, the executor captures the local assembler by value and thus makes a copy. In principle we agreed that this behavior is intended. But one may argue, that this should be avoided in the sequential case. But this would make the executor usage more invasive.

Merge request reports

Loading