This MR is not actually intended to be merged, but serves as a test bed to experiment with different strategies of how to perform the local computations of a parallel scalar product (#108).

We want to check different options about how to evaluate weighted/parallel scalar products.

The main thing is, that the local contributions to the SP are only evaluated on a subset of DOFs.

# Approaches

We see different algorithmic approaches:

## (A) (nested) bool vector

Given a nested bool vector that matches the structure of the ISTL vector, we simply mask the scalar product with the bool entries.

## (B) skip list

We define a list `std::vector<MultiIndex>`

that lists which entries
should not be considered in the scalar product.

## (C) "reverse" skip list

We use the same skip list as before.

- compute full scalar product
- evaluate a "sparse" scalar product only on the entries of the skip list
- subtract (2) from (1)

## weighted scalar product with sparse diagonal matrix

Conceptually the nicest approach is to write these as weighted inner product. This allows a slightly more general setting. We could define a very special diagonal matrix format that is by default 1 and lists only those entries that differ. This is more like an addon, where we would try to implement a particular kind of sparse diagonal matrix and a specialization for an A-inner-product, using one of the approaches above.

# Result

in my current experiments the algorithms **(C)** is surprisingly the fastest (reproducibly for a range of nested vectors). What I didn't check yet is performance for `VariableBlockVector`

.