Add BCRSMatrix::setIndicesNoSort() and speedup MatrixIndexSet::exportIdx()
So far MatrixIndexSet::exportIdx() used BCRSMatrix::addindex() on individual
column indices. This was slow because each insertion does a binary
search although the inserted indices are already sorted. Bulk-inserting
whole rows with setIndices() improves on this significantly but still
does a non-necessary sort. The latter is avoided by the new
BCRSMatrix::setIndicesNoSort() method.
This may speedup exportIdx() significantly (e.g. by factor 2 for Poisson with PQ2).
In general exportIdx() is already cheap compared to building the pattern
in typical matrix assembly. But since exportIdx() is serial, the
improvement becomes significant in multi-threaded assembly.
According to my measurements exportIdx() is dominated by the
allocation after this patch, such that there's not much potential
for further improvements.