Add MatrixIndexSet class
This is a drop in replacement for Dune::MatrixIndexSet
from dune-istl. But it uses a sorted std::vector
instead
of a std::set
internally for better performance.
If you expect very dense rows, Dune::MatrixIndexSet
may still perform better.
Using this in the operator assembler can improve performance measurably.
Merge request reports
Activity
I also consider
Dune::MatrixIndexSet
a hack that only exists because the 'interface' for building patterns inBCRSMatrix
is broken by design. And I'm not sure if we should cultivate this hack by adding another implementation. A clean solution for dune-istl would be to add a flexible interface for building patterns toBCRSMatrix
first and then implement what this MR proposes on top.But in view of the obfuscated
BCRSMatrix
code I can't even estimate the effort it takes to do this without breaking existing code.Is there a way to make the data structure smart enough to switch between the two implementations automatically?
Sure. I tried and benchmarked several versions before proposing this one. I still have 5 different implementations of the
MatrixIndexSet
laying around. Unfortunately all hybrid versions are measurable slower and introduce parameters that are hard to select.Interestingly for patterns induced by FE bases, the fastest solution is also the simplest: Just allocated one large vector with a fixed amount of entries per row. If this is exceeded by any row reallocate the whole vector and double its size. Unfortunately this does not work with multi-threaded assembly.
I did some more testing with my hybrid implementation that switches from sorted vector to
std::set
if rows exceedmaxVectorSize
. After fixing a corner case, it's very comparable to the pure vector case if the switch does not become active. To determine a good value formaxVectorSize
I benchmarked assembly of matrix pattern for Qk-discretizations onYaspGrid
. I seems that it's the fastest to never switch to thestd::set
fallback here. At least until order 30 in 2d, where one has 3721 nnz per row, it's always faster to makemaxVectorSize
large enough to never get active.Background: The worst case complexity O(n^2) for inserting n entries with the sorted vector implementation (in contrast to O(n log(n)) for
std::set
) happens for insertion in reverse order which seems to never not be the case in this example.With synthetic benchmarks of several
set<uint32_t>
-like implementations, the sorted-vector version outperformsstd::set
even in the worst case (reverse order insertion) up to about 1000 entries. Hence something likemaxVectorSize=1000
seems to be reasonable. In the synthetic benchmark this is worse up to 50% once the switch get's active (notice that we need to copy fromvector
toset
once if we pass the threshold). However, in the FE-pattern test (see above) we still get small improvements by increasingmaxVectorSize
.Hence we could in fact open a new MR for the hybrid version in dune-istl. Notice that there's another small compatibility issue: At least with the vector based implementation you gain performance by using
uint32_t
. Hence I'd like to switch this, too, inDune::MatrixIndexSet
.Cf. core/dune-istl!527 (merged) for the hybrid version with
std::set
fallback and carefully selected threshold size 2048.