Carsten Gräser (68fe8c75) at 26 Mar 14:40
Carsten Gräser (5da0de3f) at 26 Mar 14:40
Merge branch 'feature/activate-python-bindings' into 'master'
... and 13 more commits
This activates the python bindings for dune-fufem. While, strictly speaking,
there are no bindings so far, this allows to use dune-fufem in
dune.generator.algorithm
code generation.
To demonstrate this, the MR also adds to python examples poisson-pq2.py
and linear-elasticity.py
which implement the same problem as the corresponding
C++ examples. The local assemblers are not written manually but generated
using Dune::Fufem::Forms
.
Instead of having to write the c++ code for Dune::Fufem::Forms
manually,
this also provides a rudimentary form of python bindings for Dune::Fufem::Form
.
These essentially provide lookalike functions of the c++ analogues that
generate the necessary c++ expressions as strings which are passed to the
dune.generator.algorithm
just-in-time compiler utilities.
While this works surprisingly well for just under 300 loc, it's no official part of dune-fufem but just contained in the examples to demonstrate how such bindings could potentially be implemented.
Carsten Gräser (68fe8c75) at 26 Mar 13:11
[python] Simplify and document forms bindings
I still had some cleanup on my list:
Both are implemented now.
Carsten Gräser (e599c7b2) at 26 Mar 12:27
[python] Simplify and document forms bindings
Also notice, that without this patch the BCRSMatrix python test in dune-istl fails. Otherwise I would not have noticed this.
So far this is proposed for discussion. If clang does not support this, it should be disabled there. If the standard provides all that we need, I don't see a reason to use TBB, OpenMP, pthreads, ... .
Notice that this discussion proposal is only about the very simple cases where one can use straight forward parallel loops. For more complicated task-based parallelism things may be different.
Alternatively to this, one may support providing an execution policy to some O(n) methods of BCRSMatrix
.
Carsten Gräser (11ca314b) at 22 Mar 14:27
[threading] Use parallel loops/allocation in BCRSMatrix
This patch uses parallel loops when initializing the large arrays of the CRS storage. When using multiple threads in assembly, the serial construction of the matrix takes a significant fraction of time that does not scale.
It turns out that most of this time is due to allocation which seems to be serial. However, this is not entirely true. When doing a plain allocation the OS only provides the address space to the application but does not acutually associate memory pages. The latter happens when the allocated memory is first accessed. Indeed, detailed measurements reveal that the aactual llocator call is very fast, while initialization of the storage is costly. Especially, it is significantly more costly than doing exactly the access a second time.
The current patch uses a simple std::for_each(execution::par,...)
for the first initialization. Measurements indicate that
this seems to effective parallelize the expensive part of allocation.
E.g. when assembling the matrix for a lowest order
Taylor-Hood discretization of the Stokes problem
with 4 threads this reduces to total assembly time
(assemble pattern + create BCRSMatrix
+ assemble values)
by about 20%, because it reduces the serial part.
This is marked as draft because:
Carsten Gräser (c8bf2eea) at 22 Mar 14:14
[threading] Use parallel loops/allocation in BCRSMatrix
But this change is unrelated to dune-istl!560. Hence it should rather be done separately.
This is needed for dune-istl!560.
It turns out that dune-common has a downstream dependency to these implementation details in dune-istl. Hence we also need this patch dune-common!1361.
This piece of code is nasty for several reasons:
But maybe this is unavoidable for technical reasons here.
Carsten Gräser (1814335a) at 22 Mar 12:32
[python] Adjust to changed block_vector_unmanaged interface
Remove allocator template parameter from the classes
base_array_unmanaged
, compressed_base_array_unmanaged
,
block_vector_unmanaged
, compressed_block_vector_unmanaged
, and
CompressedBlockVectorWindow
.
All of these classes do not manage memory on their own.
The provided allocator was only used to deduce size_type
.
The latter is now provided as template parameter explicitely.
Notice that this (seemigly breaking) change is safe, because all these classes are clearly marked as implementation details that should not be used outside of dune-istl.
Carsten Gräser (45aae438) at 22 Mar 12:14
[cleanup] Remove unused allocators
Carsten Gräser (f1dd0990) at 21 Mar 18:49