`LeafNode::index` instead of `LocalView::index`?
dune-assembler
Context: During the last months, I have been re-implementing the PDELab function spaces against the dune-functions
interface in a module called dune-assembler
. The reasons to do so are various and not the point of this issue. The topic here is that during re-implementation I faced some performance problems related to the interface.
The dune-assembler
module is largely unfinished and undocumented, so do not dig too much into its gory details.
LocalView::index
Problem with Now, the thing is that during benchmarking of factorized and vectorized local operators for DG convection-diffusion generated by dune-codegen
(the fastest thing that we have around), I was faced with a major bottleneck on both the interface and the implementation of dune-functions
. It could be that I am doing something stupid, but this is what I understand so far and why I claim this:
- The interface in
dune-functions
requires the local view to provide a container index at the root node of the space. SeeLocalView
concept check. - Such interface, forces the implementation to gather and store indices from all nodes regardless of the nature of the index. See
Functions::DefaultLocalView
implementation.- For most function spaces, this is not a problem because the global pattern is irregular and storing them is the natural way to provide them back after
bind()
. - For DG spaces, the situation is different. The index is trivial to calculate because it is always an offset of the first index. This even holds after a for blocking and interleaving because our transformations happen from leaf-to-root (See for example post-child transformation in
Functions::CompositePreBasis
,PDELab::map_dof_indices_to_container_indices
, orAssembler::EntityOrderingNode::firstContainerIndex
). But such a trivial calculation is hard to realize from theLocalView
without extra knowledge, so you just have to store the value.
- For most function spaces, this is not a problem because the global pattern is irregular and storing them is the natural way to provide them back after
- In
PDELab
this is solved at bind-time, where the root node visitor takes only the first index of each leaf node. SeePDELab::LeafLocalFunctionSpaceNode::bind
. The problem with this approach is that the user needs to be aware of this and use a completely different assembler to ensure that the indexation does not lead to a segfault because other indices exist but are garbage. This is hard to see in PDELab, but is essentially the difference between the engines ofPDELab::GridOperator
andPDELab::FastDGGridOperator
. - Thus, for the implementation of
dune-assember
I decided to deviate from thedune-functions
interface and provide the index at theLeafNode
, where each node is able to perform an on-the-fly computation or return a stored index depending of its knowledge about the space at that node. See implementation at leaf nodeAssembler::LeafNode::LocalView::index
.
Benchmark
Now, my initial results seem to show that this reasoning is not far from reality. Functions
and regular PDELab
spaces are too slow to keep up with the speed of vectorized local operators and take much more time than the assembly time itself. The following is a figure that measures the time to assemble a residual for a vectorized local operator using different combinations of function spaces and assemblers with respect to the dune-functions
interface.
Notes:
-
DA::FunctionsSpace
: Function spaces fromdune-assembler
with leaf node index approach. -
PDELab::FunctionsSpace
: Function spaces fromdune-pdelab
with fast-dg optimization. -
Functions::FunctionsSpace
: Function spaces fromdune-functions
with root node index approach. -
DA::Assembler
: Assembler ofdune-assembler
with usingdune-pdelab
local operator interface. -
PDELab::Assembler
: Assembler ofdune-pdelab
with fast-dg optimization. - Regular PDElab function spaces + regular PDELab assembler are not shown but they are even slower than
Functions::FunctionsSpace
by an order of magnitude because it's doing many unnecessary things for DG. - Problem sizes are scaled to take approximately the same amount of DOF per problem so do not compare between different polynomial degrees.
- I am not completely sure why the differences in time are so dramatic, but it certainly has to do with extra unnecessary memory pressure on the system and more time spend on the
bind
visitor. - It could be that I am doing something very silly with
dune-functions
, but my experience with PDELab and Assembler also points out that indexation at the root node (i.e.LocalView::index
) is the underlying issue. - Compilers sometimes are able to see through the tree traversal and figure out the whole purpose of it, but in my experience, it happens so seldom that is better to actively act on this problem.
LeafNode::index
or LocalView::index(path, node_dof)
Proposal: I do like the interface, but the LocalView::index
is a major complication for me to attain to it. So I would like to propose an interface change that allows implementations to leverage the structure of DG layout:
- (preferred) One form to do this is to move the
index()
function fromLocalView
to theLeafNode
. I think this one is the most intrusive one but the one that gave me the best results indune-assembler
. Indeed, together with aLeafNode::path
function, this was also the one that allowed me to seamlessly re-use PDELab local operators (like the ones generated bydune-codegen
) withdune-functions
function spaces. - Another way is to also pass the path to the desired leaf node:
LocalView::index(path, node_dof)
. This seems to be easiest to implement indune-functions
but interface-wise looks more clumsy becausenode_dof
may be easily confused with thelocalIndex
of the root node. In this case, compatibility with PDELab local operators get more complicated. - Do you have any other ideas?