Improve performance of assembleGlobalBasisTransferMatrix
The old approach used a std::map
and compared coordinates
in the inner most loop to avoid duplicate evaluations
of LocalBasis
. Instead the new approach determines the
interpolation points first and caches the values in advaced
based on the evaluation order. This improved the performance
significantly.
This also removed tracking of already processed indices
in a std::unordered_set
, since it turned out that in
all tested combinations this is slower than recomputing them.
Once we have a utility to generically create a suitable nested
bit-vector type for a basis, we can reintroduce this optimization,
because this would indeed improve the performance.
Furthermore this cleans up the includes and removes some no longer
used wrappers. Despite being implementation details, the function
and geometry wrappers have not been in an Impl::
namespace.
Hence there's a small possibility that someone used them elsewhere
outside of dune-fufem.
Merge request reports
Activity
added 1 commit
- 27404f93 - Remove tracking of processed entries in assembleGlobalBasisTransferMatrix
With the implemented improvements the total runtime is reduced significantly. E.g. computing the P1->P2 interpolation is now more than tree times faster. Since computing this interpolation requires a significant part of the total runtime for multigrid with p-coarsening, this also provides a significant improvement for the multigrid performance.
Notice that there is potential for more improvements:
- Don't reallocate containers used in the inner loops.
- Cache evaluation of coarse basis functions.
- Cache interpolation points of fine basis functions.
- Cache local interpolation matrices. E.g. for P1->P2 they are the same among all elements with the same
GeometryType
. - Parallelize the element loops.
Edited by Carsten Gräsermentioned in commit fd02899e