Improve performance of MatrixIndexSet
This improves performance of MatrixIndexSet
significantly by two changes:
-
Replace the
std::set
used to store the column indices for each rows by astd::vector
based implementation. Thestd::vector
is kept sorted when inserting such that we can avoid duplicates. In the worst case (reverse order insertion) this may lead to O(n^2) complexity when inserting n entries in a row compared to O(n log(n)) forstd::set
. However, the sortedstd::vector
implementation still wins for relatively large n. To avoid the worst case complexity when using very dense rows, the implementation automatically switches tostd::set
if the size exceeds the threshold valuemaxVectorSize
. The defaultmaxVectorSize=2048
was selected based on benchmark results. -
Change the stored index type from
std::size_t
touint32_t
to reduce memory consumption and improve performance. Notice that this excludes exceptionally large matrices with more than 2^32 columns that worked before. To give an estimate on the influence of this: If you want to build a squareBCRSMatrix
with 2^32 rows and two entries per row using the oldMatrixIndexSet
implementation this alone would need more than 0.5 TB memory. One could make the index type a template parameter, but this would be a breaking interface change. -
Additionally this MR adds documentation of the thread safety of
MatrixIndexSet
. While this does not change any code, it is strictly speaking an interface change, because it introduces public guarantees that have been implementation details before. Hence you can no longer relax these guarantees (e.g. to change the implementation from row- to column-major) without breaking compatibility. However, it's IMO important to make these guarantees, because without them you cannot do multi-threaded assembly. (Notice that the 'implicit' build mode ofBCRSMatrix
is e.g. not thread safe.)
Merge request reports
Activity
mentioned in merge request fufem/dune-fufem!163 (closed)
assigned to @carsten.graeser
unassigned @carsten.graeser
assigned to @oliver.sander
- Resolved by Carsten Gräser
- Resolved by Carsten Gräser
- Resolved by Carsten Gräser
- Resolved by Carsten Gräser
- Resolved by Carsten Gräser
- Resolved by Carsten Gräser
- Resolved by Carsten Gräser
- Resolved by Carsten Gräser
- Resolved by Carsten Gräser
Be warned: There's no simple answer to this. The sorted
vector
implementation is mainly problematic if you insert in (almost) reverse order. The default threshold was selected based on tests with reverse order to balance two concurrent goals: a) Use the fastvector
implementation as long as possible. b) Reduce the additional complexity when switching toset
as much as possible (because you essentially sort twice).For forward or random insertion order the
vector
based implementation wins for even denser rows. But I consider more than 2048 entries per row already exceptionally large and trying to optimize the default further is probably pointless. In exceptional cases you can still change the threshold according to your specific needs.
- Resolved by Carsten Gräser
- Resolved by Carsten Gräser
- Resolved by Carsten Gräser
- Resolved by Simon Praetorius
I am not yet 100% happy about the name
maxVectorSize
. When reading the code it gets clear that this is kind of a switching point. It is not related to the maximal number of entries to store in the matrix, also not the expected number of nonzeros. If we use a vector, we could clearly make also a reservation about the expected number of nonzeros per row.The internal storage should be more like an implementation detail. We use the vector as long as it is the fastest, then switch to something else. How to choose the
maxVectorSize
? Do we need to first perform some experiments and benchmarks and then can make an educated guess? But why do we switch to astd::set
then and not to astd::unordered_set
? Why do we store the row nonzeros in a vector and not in aReservedVector
if we know that there are just a very few entries per row?As I said, I like this improvement of the (default) MatrixIndexSet. But maybe it would be nice to just have an alternative or a much more generic MatrixIndexSet, where it even could be possible to configure the internal storage.
added 1 commit
- aa9408f1 - [doc] Document thread safety guarantees of MatrixIndexSet
mentioned in commit a0db70c0
mentioned in merge request dune-common!1251 (merged)
@carsten.graeser do you happen to still have some benchmark code to test changes on
MatrixIndexSet
?