Skip to content
Snippets Groups Projects

Add MatrixIndexSet class

Closed Carsten Gräser requested to merge feature/vector-based-matrixindexset into master
1 unresolved thread

This is a drop in replacement for Dune::MatrixIndexSet from dune-istl. But it uses a sorted std::vector instead of a std::set internally for better performance. If you expect very dense rows, Dune::MatrixIndexSet may still perform better.

Using this in the operator assembler can improve performance measurably.

Merge request reports

Pipeline #61227 passed

Pipeline passed for 30efff53 on feature/vector-based-matrixindexset

Closed by Carsten GräserCarsten Gräser 1 year ago (Apr 20, 2023 12:58pm UTC)

Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
    • I think this should not be in dune-fufem. If it really is a drop-in replacement for MatrixIndexSet, why not merge it with the current implementation, and have users select with a compile-time flag which implementation they want?

    • Because this changes the user interface. If you add a template parameter to Dune::MatrixIndexSet you at least have to write Dune::MatrixIndexSet<> if there's a default parameter. The only possibility I see is a separate class.

    • Hmm, true. Is there a way to make the data structure smart enough to switch between the two implementations automatically?

    • I also consider Dune::MatrixIndexSet a hack that only exists because the 'interface' for building patterns in BCRSMatrix is broken by design. And I'm not sure if we should cultivate this hack by adding another implementation. A clean solution for dune-istl would be to add a flexible interface for building patterns to BCRSMatrix first and then implement what this MR proposes on top.

      But in view of the obfuscated BCRSMatrix code I can't even estimate the effort it takes to do this without breaking existing code.

    • Is there a way to make the data structure smart enough to switch between the two implementations automatically?

      Sure. I tried and benchmarked several versions before proposing this one. I still have 5 different implementations of the MatrixIndexSet laying around. Unfortunately all hybrid versions are measurable slower and introduce parameters that are hard to select.

      Interestingly for patterns induced by FE bases, the fastest solution is also the simplest: Just allocated one large vector with a fixed amount of entries per row. If this is exceeded by any row reallocate the whole vector and double its size. Unfortunately this does not work with multi-threaded assembly.

    • I did some more testing with my hybrid implementation that switches from sorted vector to std::set if rows exceed maxVectorSize. After fixing a corner case, it's very comparable to the pure vector case if the switch does not become active. To determine a good value for maxVectorSize I benchmarked assembly of matrix pattern for Qk-discretizations on YaspGrid. I seems that it's the fastest to never switch to the std::set fallback here. At least until order 30 in 2d, where one has 3721 nnz per row, it's always faster to make maxVectorSize large enough to never get active.

      Background: The worst case complexity O(n^2) for inserting n entries with the sorted vector implementation (in contrast to O(n log(n)) for std::set) happens for insertion in reverse order which seems to never not be the case in this example.

    • Maybe that maxVectorSize variable can become a user-controllable value (with a good default)?

    • It already is. I just want to find out the good default. For the test case described above it seems to be infinity.

    • With synthetic benchmarks of several set<uint32_t>-like implementations, the sorted-vector version outperforms std::set even in the worst case (reverse order insertion) up to about 1000 entries. Hence something like maxVectorSize=1000 seems to be reasonable. In the synthetic benchmark this is worse up to 50% once the switch get's active (notice that we need to copy from vector to set once if we pass the threshold). However, in the FE-pattern test (see above) we still get small improvements by increasing maxVectorSize.

      Hence we could in fact open a new MR for the hybrid version in dune-istl. Notice that there's another small compatibility issue: At least with the vector based implementation you gain performance by using uint32_t. Hence I'd like to switch this, too, in Dune::MatrixIndexSet.

    • Cf. core/dune-istl!527 (merged) for the hybrid version with std::set fallback and carefully selected threshold size 2048.

    • Please register or sign in to reply
  • Closing since I updated the implementation in dune-istl instead.

Please register or sign in to reply
Loading