Generic finite element implementations rely on undefined behaviour

I can see three possible options to avoid undefined behaviour:

(a) Fix the code.
(b) Deprecate the code.
(c) Add -fno-strict-aliasing to the dune build flags.

Is anyone interested in doing (a)?

I'd prefer fixing the code

mentioned in merge request staging/dune-functions!200 (merged)

Sure, that'S preferable. But someone has to do this. While I did to some minor cleanups to this code recently, it still looks scary to me and in my impression the offending reinterpret_casts are depely tied to the whole code structure. Probably @andreas.dedner is the only one who could do this without doing some serious code archeologogy.

For the record: At least for his hp-DG implementation @lasse.hinrichsen observed that implementing runtime-order using type-erasure is significantly faster compared to an implementation with truly dynamic order. Notice that the price of type-erasure is paid here anyway, because it's used to switch wrt. topology types.

I can have a look but not in the next few weeks

Using type erasure is of course only relevant in the case of using a hybrid grid which in combination with DG will not be the typical use case? Then again after caching is employed....

No, he's using type erasure wrt. the order. In fact it's based on the LocalFiniteElementVariant which has the drawback that you need to fix the maximal order. On the other hand there's many statements about std::visit being potentially faster than a virtual call.

I checked a little bit the code and all evil seems to originate from tensor.hh. The tensor implementation currently stores things in FieldVector and contains these as a member. The casts are used to map these types.

I think the best approach would be a proper interface definition of tensor and then a clean new implementation.

I also think that the ominous block() method should not be necessary, as it gives access to the internal data and does the reinterpretation.

My idea would be that we introduce a light weight wrapper for the tensors with entry-points to the actual data. The sizes are template parameters, so that these classes really would only consist of a pointer. The costs would be the same as for the reference we are currently handing out. If I understood everything correctly, the problem arises when going through the derivatives. As these need the initial coefficients anyway, we should be able to store these outside and then just work on this external storage.

Comments?!

@andreas.dedner is there somewhere further documentation about the tensor and derivative classes? Perhaps in you dune-user-meeting article?

@andreas.dedner I just tried out the type erasure approach for dynamic order k and tested it with the poisson-pq2 example in dune-functions. It turns out that incorporating the dynamic k into a LocalFiniteElementVariant based cache did not change the runtime compared to a static k (as expected) while using the runtime order leads to an increase of ~10% runtime for assembling a stiffness matrix for k=2.

Some more measurements: For order k=1 the overhead of the runtime version is about 20%. For k=3 it's almost equally fast and for k=4 the runtime version is even faster.

@carsten.graeser do you have branch with the dynamic version?

@christi Short answer: No. Long answer: I locally created a LocalFiniteElementVariant-based cache to implement dynamic k with underlying static k implementations. I can provide it, but it's quite hacky and maybe bot what you're looking for.

But if you're only interested in the (surprisingly) better behaviour of the fully dynamic implementation, this does not need the cache. You can simply see this by changing the order in poisson-pq2 to 4 and switching between

typedef Functions::LagrangeBasis<GridView,4> FEBasis;
FEBasis feBasis(gridView);

and

using namespace Functions::BasisFactory;
auto feBasis = makeBasis(gridView, lagrange(4));

As far as I can see the monomial-based dynamic implementation should be linear in the order while I suspect that the Pk/Qk implementations are maybe not.

Just a remark (or perhaps I'm misunderstanding Carsten's test): the dynamic order is more a by-product of the approach that Martin and I took it was more about implementing the functionals and then having a automatic mechanism to construct the correct basis functions. Speed was never that much of an issue because we assumed the use of a quadrature cache anyway. Compiler time improvement is of course another issue :-)

This issue was marked bug and critical and yet 3 years later, it's still open. Maybe it's not so critical after all.

Whether you want to call this critical or not is debatable. But it's still undefined behavior. And there's a deliberate compiler flag to allow this.

It is also not so easy to solve. I made one attempt but then stoped due to time limitations. I would leave this issue open as a reminder. Maybe with a proper tensor implementation + slicing, spans and subspan, this all can be implemented very nicely. But this needs time. We could just add a comment in the code (is added in dune-functions in global bases that use these local functions, for example), but those are mostly forgotten about. (There is a recent news about a python package for tar archives with a bug in useful functions that was not fixed but just a comment added that you should be careful. Now 15years later about 350,000 python packages have a security bug)

Ok, good. Clearly post 2.9.

Generic finite element implementations rely on undefined behaviour

Designs

Child items ...

Activity