Direct support for multi indices

added discussion feature labels

For the record: In dune-fufem we also have utilities for accessing nested istl-matrices using non-uniform multi-indices similar to the one for vectors in dune-functions. Most of the problems originate from using multi-type containers and corresponding dynamic non-uniform multi-indices:

Accessing multi-type containers with dynamic indices requires some dynamic to static dispatch (which comes at a runtime cost).
Any method/utility providing entry access by returning references to entries requires that the return type (i.e. the fully resolved entry type) is uniform (no mixed real/complex entries).
Accessing non-uniform multi-indices cannot generically be mapped to istl-matrices because the latter all require a row and column index. Hence one needs to provide some additional information to do this. In dune-fufem this is solved by tagging matrices using SingleRowMatrix and SingleColumnMatrix wrappers.

To ease the discussion: Our approach is contained in Dune::Functions::ISTLVectorBackend and Dune::Fufem::ISTLMatrixBackend.

These are all important issues. Thanks for bringing these up here.

Having some specific implementations, of e.g. multi-index generation in the downstream discretization modules is still needed and cannot be extracted or generalized. We discussed that there is still code duplication for which we might find the common part that could be useful in the core module dune-istl.

One aspect is the multi-index access. It has intrinsic issues as you mentioned and it is hard to reimplement this in your own module. Maybe this is independent enough to be extracted into dune-istl.
Due to the issue of the required common element type, we thought that it would be better in some situations to have a "for-each" or "apply" function, that does not return, but passes the container block into a generic function passed to the algorithm. Maybe this is easier to implement (?). Still we need to protect several invalid accesses and provide a proper recursion stop condition for the hierarchic traversal, but there are several ways to do that.

Just a sketch:

template <class Indices, class Container, class F>
void forEachIndex (Indices const& indices, Container&& container, F&& f)
{
  for (auto const& index : indices)
    applyAtIndex(index,container,f);
}

template <class MultiIndex, class Block, class F, std::size_t i = 0>
void applyAtIndex (MultiIndex const& mi, Block&& block, F&& f, index_constant<i> = {})
{
  if constexpr(i < mi.max_size()) {
    if (i < mi.size()) {
      if constexpr(hasDynamicIndexAccess<Block>) {
        // access block directly
        applyAtIndex(mi,block[mi[i]],f,index_constant<i+1>{});
      }
      else if constexpr(hasStaticIndexAccess<Block>) {
        // access block by searching for the static index equal to mi[i]
        using Size = decltype(Hybrid::size(block));
        return Hybrid::switchCases(std::make_index_sequence<Size::value>{},mi[i],
          [&](auto ii) { applyAtIndex(mi,block[ii],f,index_constant<i+1>{}); });
      }
      else {
        // leaf entry in the container
        f(mi,block);
      }
    }
    else {
      // end of multi-index
      f(mi,block);
    }
  }
  else {
    // end of multi-index (static condition)
    f(mi,block);
  }
}

The matrix part is difficult, probably not possible in a full generic way without anything extra on the matrix side.

I have just seen that you have also implemented such a for-each (or visitor) strategy in fufem.

In dune-blocked I have also reimplemeted and extended the hierarchic vector/matrix access. So we have at least 3-4 modules doing the same.

For reference, I also have something like that implemented in my own module:

MultiIndex concept
Container Accessors
forEach with execution policies used to have a container traversal on leaf entries (implementation of @simon.praetorius snippet)

Comparing those approaches @simon.praetorius's seems to be very similar to what we do in dune-fufem with a few differences. The most important one to me is, that dune-fufem allows to customize the termination criterion in the recursion which is needed to treat some special cases:

Historically it was very common to use FieldMatrix<K,1,1> as scalar (because you could not use BCRSMatrix<K> directly for a long time) which mostly works. Unfortunately, there is no cast to non-const K&, such that the is K or casts to K termination criterion does not work.
Similarly there's no cast of ScaledIdentityMatrix<K,1,1> to K& and const K&.
When assembling into something like OuterMatrix<... ScaledIdentityMatrix<K,n> ...> it's hard to decide from a dynamic multi-index if entries exist. To avoid this one can implement some addToEntry(...) utility that ignores non-existing entries. This again needs to parametrize the index resolution with a visitor and some custom termination criteria. A prototype (that doesn't convince me fully so far) is implemented here.

In contrast, @santiago.ospina's approach seems to not provide access with dynamic multi-indices so far.

In contrast, @santiago.ospina's approach seems to not provide access with dynamic multi-indices so far.

Not supporting dynamic multi-indices with multi-type containers is a deliberate decision: Code itself is trivial, but I regard it as an anti-pattern. Especially for index accessors. To avoid the run-time penalty, I provide several utility functions that traverse the container tree with hybrid multi-indices. When this is not sufficient, this force the user to hard-code the dynamic dispatch by themselves and make the run-time penalty obvious.

I think all these remarks back my original statement ... it would be helpful to include this already in dune-istl. I further think we should try to identify the necessary usecases in the different modules.

@santiago.ospina Interestingly I'm of the opposite wrt most of what you said:

In contrast to access by appropriate static/hybrid multi-indices, I consider access by dynamic ones to be the non-trivial case. Especially for index accessors.
For me the dynamic to static dispatch at runtime is as much an anti-pattern as using std::variant with std::visit.
In contrast to FE ansatz trees, traversal of matrix/vector containers with multi-indices is something that I have never needed so far.
To me it's pointless to provide a generic functionality if it only covers the trivial case that every user can implement herself but leaves out the complicated case that most will struggle to get right.

(edit: Removed second part of my comment, because it's a reply to another post)

@carsten.graeser Yes, thinking about it better, you are probably right wrt the difficulty on the implementation side. If I understand it correctly, is mostly because it requires inside knowledge from the containers and their contained types. So I retire my statement about it.

Perhaps it's worth rewinding and see why I have a position that is diametrically opposite to yours: I created dune-assembler, which is essentially a re-implementation of PDELab. From the very beginning, the design envisioned to have nested and heterogeneous containers. In particular, I tried to be very careful in having most of the elements to allow unit safety which is a more restrictive condition than heterogeneous containers. Thus, virtually everything has been handled with hybrid multi-indices from day zero, and as a result, I never had to fall back to dynamic multi-index container access. Neither for DOFs nor for communication operations.

So, despite having a functional complex finite element module that handles heterogeneous containers, I don't find difficult the dynamic multi-indices accesses because I don't have them in the first place. On the other hand, since I find hybrid multi-indices a necessary condition towards heterogeneous containers usage, what I find difficult is the generation of such hybrid multi-indices. And this is where most of our disagreement comes from: I do not think that using dynamic multi-indices access is the right approach for our applications whereas you treat it as a fundamental part of your machinery.

So, to wrap up, I think that your arguments make sense as long as one can show the need for the dynamic multi-index access. That is your case, and is totally opposite to mine.

I do see the value on having dynamic dispatch for code like @simon.praetorius's and @carsten.graeser's. It makes sense. But, as a hard-core hybrid multi-index believer, I find very difficult to parse functions that may unintentionally trigger the dynamic dispatch (i.e. MultiTypeContainer::operator[](std::size_t)). It is simply too easy to forget and get into a run-time penalty when not needed. And the problem with this is that it will compile and go unnoticed for very long time (e.g. staging/dune-typetree!106 (merged)).

So, in a generic module like dune-istl, I would like to propose to have different function names/operators for the two different approaches.

Why just dune-ISTL? I think this belongs into dune-common, as many matrix types are already there. Further, adding ISTL as a dependency to dune-typetree just for getting multi-indices doesn't sound right, does it?

So, to wrap up, I think that your arguments make sense as long as one can show the need for the dynamic multi-index access. That is your case, and is totally opposite to mine.

If this is really needed I can repeat the obvious: The reason why dynamic multi-index access is necessary is that the many people use it. Dune-functions and thus also dune-fufem and Amdis rely on in. As far as I remember, PDELab did, too.

@gruenich

why dune-istl? Because dune-common doesn't know about nested vector/matrix types. consequently it doesn't use multi-indices
dune-typetree... how did you get the impression that we want to introduce an additional dependency to dune-typetree? First multi-indeices are not defined dune-typetree and second we don't want to enforce a particular multi-index implementation, but build upon a documented interface.

But at least the TypeTree::HybridTreePath might possibly be a candidate for multi-indices. It is not implemented like this within dune-functions, but in another toolbox this might be the case. If I understand @santiago.ospina correctly, he uses the hybrid container to implement multi-indices. Nevertheless, if we can agree on a minimal interface for multi-indices, maybe we can adapt the tree-path implementation correspondingly. No need to add a dependency to dune-typetree, though.

Yes, since staging/dune-typetree!107 (merged) was merged TypeTree::HybridTreePath is a full multi-index with respect to the MultiIndex concept given above. Dune::ReservedVector is not. This is a (sub-optimal) implementation of what would be missing.

I wouldn't go for this concept, since (as far as I can tell) all std containers do not fulfill this concept.

The reason why dynamic multi-index access is necessary is that the many people use it. [...] As far as I remember, PDELab did, too.

Not really. PDELab has very little support for multi-typed containers as far as I can tell.

I wouldn't go for this concept, since (as far as I can tell) all std containers do not fulfill this concept.

Why do you want std containers to be multi-indices?

Not really. PDELab has very little support for multi-typed containers as far as I can tell.

At least that's the way the PDELab devs wanted to go, when we designed the dune-functions interface.

Not really. PDELab has very little support for multi-typed containers as far as I can tell.

yes, but not because PDELab has no use for multi-type containers, but because

it is far from trivial to change the (undocumented) internals of PDELab
it requires some (manageable) effort to adjust the backend to support these container types.

We can discuss that in the module itself pdelab/dune-pdelab#195. The current status is: limited support.

For the record...

while experimenting, I observed, that we need methods to traverse different containers at once, e.g. to calculate a scalar product.

This holds for the suggested forEachIndex, but also for other functions, like the flatVectorForEach etc. One problem with the later is that it's difficult to implement this in a backwards compatible way.

The other question in general is, what to do, when the nesting depth of the different containers differ.

One could for example think of a masked evaluation, where the vector is BlockVector<FieldVector<double,k>>, but the bool vector has lower depth, e.g. std::vector<bool>. The reason is that one will often be in a situation, we have to consider the whole FieldVector or skip it completely. Opinions?

I have done this recursive inner-product and transform algorithms in AMDiS, see e.g., https://gitlab.com/amdis/amdis/-/blob/feature/hierarchic-istl-backend/amdis/algorithm/InnerProduct.hpp and others in the algorithm directory. These mimic the std algorithms (and also uses these) in a hierarchic manner.

For different containers to work in a hierarchic traversal, we still nee a static break condition. IT could, for example, be the case that the two containers share some hierarchy and then one cannot be iterated further down. This should stop the iteration. However, my inner-product implementation linked above does not do this.

The questions brought up by @christi are indeed important. And we should IMHO first clarify 1. and 2.

Should include the question if we want to support dynamic non-uniform indices (for me the answer is yes).
... could ignore the question on a member function/operator for now. Having an external utility would be fine as a first step. However, the question of entry visitation additionally to plain access seems to be important to me (see above).

Before considering the question of member functions and 3. one should IMO try to generalize the concepts/utilities to matrices and think about the Single(Row|Column)Matrix approach that @simon.praetorius and I came up with in parallel.

in the communication case, I think we have to support non-uniform indices.
also in many other cases non-uniform indices make life easier, as this functionality would otherwise need to be provided by the discretization module or by the user.
I agree, a member function can easily be added later.
and finally yes, we also have to think about matrices, but I started with the vectors, as these are the ones, where I currently have a better picture of what is needed.

I think it's important that the interfaces provided for matrices and vectors are consistent. Hence we should have the matrix case in mind from the beginning to avoid having to change the vector case later. To give an example: The vector-backend in dune-functions was invented having vectors in mind only and thus provides operator[] for multi-index access. In contrast the later developed corresponding matrix-backend in dune-fufem has to use operator() (at least until C++23).

What about adding one of our implementations of Single[Row|Col]Matrix to dune-istl? Additionally, an array-like container with static size and istl container interface would be nice (but this is not so important for now)

This may sound trivial, but there has to be an agreement on the direction of the multi-index. When reading indices left-to-right: is it Inner container index to Outer container index (IO), or Outer container index to Inner container index (OI)? This is important because:

Algorithms will result on either a push or a pop which for dynamic multi-indices may be more or less efficient depending on the direction of the algorithms:
- Container access consume multi-indices and will be more efficient when having IO direction.
- Container traversals produce multi-indices and will be more efficient when having OI direction.
On dynamic multi-indices, a reversal of the indices may be very costly if performed very often.
If this is failed to be addressed, code may seem to work, until someone actually uses nested containers and finds out that there is a mismatch in the direction somewhere (flat containers are direction agnostic and most code will seem to work).
Different discretization modules have different conventions when they produce and consume their multi-indices:

Module	Direction	Type
Amdis	`OI`	Dynamic
Assembler	`OI`	Hybrid
DuMuX	`OI`	?
FuFem	`OI`	Dynamic
Functions	`OI`	Dynamic
PDELab	`IO`	Dynamic

Note: Feel free to modify this comment to add/modify entries to the table

This may sound trivial, but there has to be an agreement on the direction of the multi-index.

Any thing else then outer-to-inner does not make sense to me. And I cannot imagine any reason why this should be less efficient (unless one would really remove consumed indices using a pop_front operation - but why would anyone want to do this).

BTW: Dune-fufem also uses outer first. The same is probably true for Amdis since it also relies on dune-functions. And my last information is, that there's the plan that PDELab slowly moves into the direction of dune-functions.

I also think that OI is ore intuitive and should be implementable without a performance penalty. As a first step we would have to decide on a multiindex interface anyway and depending on the actual implementation a thin wrapper could then map to this interfaces. This does not only concern OI/IO, but also the question about pop and shift, which are mostly equivalent, and can be handled via a small wrapper.

I forgot another one:

Discretisation modules tend to produce indices in a bottom-up approach, or in other words, they first produce inner container indices, then modify them, and finally proceed with the next outer container. This is also not very optimal for OI dynamic multi-indices as this needs a push_front (e.g. dune-functions).

This very much depends on the application. In many cases this just uses push_back(). Furthermore one could implement the other cases without push_front() within the existing dune-functions interface. However, I'd not try to do this unless someone shows that this really provides a measurable performance improvement.

[...] And my last information is, that there's the plan that PDELab slowly moves into the direction of dune-functions.

That's partially one of the things I tried to do in dune-assembler: A wrapper of PDELab grid function spaces that satisfies the dune-functions interface, but the reversal of multi-indices in the localView.index(i) ended up being super costly! Now it's solved with another approach, but that's in principle why I am bringing up the multi-index direction question below. It has real performance implications.

I am very skeptical that PDELab will ever change the direction of the multi-indices because it propagates to all sort of places in the module, but who knows...

@robert.kloefkorn, @andreas.dedner, @samuel.burbulla, @tkoch, @markus.blatt I want to fill this table regarding multi-index production on blocked containers for different framekworks. Do you have container blocking in your modules? If so, what direction of multi-index do you use?

we use container blocking in dumux but currently in a very limited/fixed scope (one level of static blocking (multi-type vector/matrix) plus one level of dynamic blocking (blockvector/bcrs)). We thought about multi-indices a lot to simplify vector/matrix traversal and generalize assembly (support different blocking strategies). We mostly did not progress because it is hard to get a generic implementation right. So I really appreciate this initiative! For the multi-index direction, outer->inner seems like the natural choice for me, I haven't thought about the other direction.

When talking to @christi we discussed that while we do not necessarily need a concrete MultiIndex type in dune-istl, we at least need a MultiIndex concept: how do we want to use the multi-indices. In the comments above there were mentioned already a few possibilities how to traverse the multi-index components - by pop() function, or by shifted indexing. A suggestion (up to discussion):

template <class M>
concept MultiIndex = requires(const M mi, std::size_t i)
{
  requires std::integral<typename M::value_type>
  { mi[i] }         -> std::convertible_to<typename M::value_type>;
  { mi.empty() }    -> std::convertible_to<bool>;
  { mi.size() }     -> std::convertible_to<std::size_t>;
  { mi.max_size() } -> std::convertible_to<std::size_t>; // must be constexpr
  std::integral_constant<std::size_t, m.max_size()>;
};

I'm not sure that we need max_size() and depending on the implementation of the multi-index, it might even be undefined.

Perhaps you can explain how you are using max_size.

In case you have a balanced hierarchic container (length of the multi-index is the same for all blocks), we can use a std::array-like container to represent the MultiIndex. Then max_size == size and we can implement a static recursion break condition. In case of dynamic (unbalanced) hierarchic containers, the max_size is either = capacity and thus a final break condition, or it is something like std::numeric_limits<size_type>::max() and thus hardly ever used to break the recursion. See also the example above.

All std containers should have a constexpr max_size() method. Also ReservedVector does. HybridTreePath does not, but could easily be added.

Maybe we should instead ask for a conditional constexpr size() method. But I hate to always check whether size is constexpr and then call it in a if constexpr context, otherwise just using if. This makes the implementation harder. BTW, the new c++ std proposal mdspan implements both, a static and dynamic size/extent function, where the static implementation returns std::dynamic_extent in case no static size information is available for the corresponding dimension.

Maybe max_size (or static size) is not needed at all. Then we should agree on another ways to implement the recursion break condition, e.g., the container cannot be accessed using operator[] anymore. However, the proposed hierarchic container TypeTree::SizeTree has a "leaf" container EmptySizeTree that implements an operator[] redirecting to itself. For this, the access-break condition does not work. Maybe this is an exotic case, though.

All std containers should have a constexpr max_size() method. [...] HybridTreePath does not, but could easily be added.

This is already included: staging/dune-typetree!107 (merged)

Thanks. I forgot about that.

I'd rather start with stronger conditions including max_size(). because relaxing the requirements later is easier than increasing them.

mentioned in issue pdelab/dune-pdelab#195

The issue with the different directions of the multi-index processing makes the generalization a bit more difficult. We currently find these different conventions in modules, so we cannot completely ignore it. While I hear arguments for OI, there might be other modules than just PDELab implementing IO. We have now two options:

Enforce one of the directions, probably OI since used by more modules.
Require a more involved MultiIndex interface that is invariant of the direction, e.g., it could provide functions pop() or get() and shift() that either provide the most left or most right index subject to a corresponding index shift.

Maybe we could also provide a customization point so that downstream modules can simply implement their own strategy of MultiIndex traversal, e.g., something like the ShiftedMultiIndex class in dune-functions.

Option 2. and also the specialization idea seems more complicated to document. On the other hand it provides more flexibility.

I would advocate to decide for one direction in the interface. If a module uses reversed order, providing a wrapper reversing the order (on the fly) should be simple. Technically, both directions should work equally well (maybe except (*) below). However I can see many conceptual reasons for the OI order, but I have not seen a single one for the reverse order.

(*) When using the index to access a container, you first have to provide the outer index. For OI this means to simply use the first one which requires to dereference an address. Later indices are obtained by successive increments of this address. When using IO with a dynamic index, one additionally has to add the (runtime) length of the multi-index to the address before dereferencing which may come at an extra cost. While one can argue that for some (less frequently used) index schemes the same applies when building an OI index, you only do this once, while the index is often used several times to access a container.

This issue is only problematic on container accessors with dynamic multi-indices. On container traversal, you control both multi-index creation and container access. And reversal is for free on hybrid multi-indices.

Maybe we could also provide a customization point so that downstream modules can simply implement their own strategy of MultiIndex traversal, e.g., something like the ShiftedMultiIndex class in dune-functions.

In my opinion, this is the way. That's how I overcame the performance penalty of the reversal of the multi-indices from IO format in PDELab to OI format in dune-assembler (Customization point vs Customized point).

Introducing such customization points complicates the interface significantly. This would e.g. mean to introduce a new interface, to get the first (in what ever user defined sense) entry of a container, like e.g. Dune::front(multiIndex), where you in general don't have Dune::front(multiIndex) == multiIndex.front(). I'm really looking forward to explain this to users that struggle to see the intrinsic beauty.

Also notice that this forbids that different modules use the same index implementation. Because if you specially Dune::front(MI) for the multi-index type MI as IO for module A you cannot use it as OI with module B or for any other custom permutation in module C.

I consider it much more reasonable to teach users to use vectorBackend(reverseIndex(multiIndex)) because a certain module uses the reverse order in contrast to the general convention. And since I still have not seen any ~~valid~~ convincing reason for a different convention I would object if this convention is anything else than OI.

I agree with you, customization point on individual multi-index operations (e.g. front(multiIndex)) would be a horrible idea. I am talking about the overall accessVectorEntry(Vector,MultiIndex), not about the individual multi-index operations.

I would not consider this a customization point, because again you cannot specialize for Vector or MultiIndex if these should be used in different modules. I guess what you have in mind is implementing:

namespace Dune::PDELab {

template<class Vector, class MultiIndex>
decltype(auto) accessVectorEntry(Vector&& vector, const MultiIndex& multiIndex)
{
  return Dune::accessVector(std::forward<Vector>(vector), reverseMultiIndexView(multiIndex));
}

}

where reverseMultiIndexView() is either implemented in dune-common or dune-pdelab.

If - in contrast - accessVectorEntry() should really be customized (in the sense of implemented separately) in any downstream module, we're exactly where we are now, except that we agree on a common name for this function.

I mean the second. The point of customization points is to be able to use code that you have not seen yet. In other words, dune-istl and other modules may invoke accessVectorEntry() independent of the direction of the multi-index and without seeing how it will be implemented later on, yet, having a default implementation with OI direction. Yes, we agree on the name, but is also much more than that. So I don't agree that that's "where we are now".

In other words, dune-istl and other modules may invoke accessVectorEntry() independent of the direction of the multi-index and without seeing how it will be implemented later on

That's exactly opposite to the topic brought up by @christi. He suggests (and I strongly agree), that it would be helpful to include functions providing multi-index access into dune-istl, such they can be used by downstream modules. You suggest that dune-istl uses functions implemented downstream.

I suggest to solve the multi-index direction issue with a default OI implementation in ISTL and opt-in alternative implementation elsewhere.

I don't think reimplementing these different algorithms in down-stream modules is a viable idea. We want to reduce the complexity and not increase it.

I still don't get where the problem of a an additional view-wrapper that maps IO to OI is.

For example in the case of an algebraic (i.e. not grid based) communication, we would collect a list of multi-indices that need to be sent to an other rank. If PDELab uses a different semantic for the multiindices, we would once map these and store the mapped ones.

If we want to replace major parts of the backend, we would have either very few indices, that could be wrapped, or we operate on all entries and would just use the flat-view helpers from ISTL.

OK. I started experimenting a bit further...

I have a bit of PDELab code to extract multi-indices for communication with the grid (https://gitlab.dune-project.org/pdelab/dune-pdelab/-/commits/feature/cachedcomm)
and ... a first implementation of the forEachIndex, based in @simon.praetorius' sketch above (https://gitlab.dune-project.org/core/dune-istl/-/commits/feature/cachedcomm)

[the naming of the branch is a bit strange in order to fit to that of PDElab]

First I have to say, it works with some minor problems, but I also found some further functions that would be very helpful, e.g. given a set of indices, compute the size of this subset. One can do this with the existing functionality, but in many cases one could significantly speed things up. Also forEachIndex and flatVectorForEach would be helpful not only on a single vector, but on a set of vectors iterated in parallel.

mentioned in issue #108

mentioned in merge request dune-common!1231 (merged)

mentioned in merge request dune-common!1209 (merged)

Direct support for multi indices

Designs

Child items ...

Activity