Explore using libxsmm from dune-perftool
libxsmm is a library for high performance matrix matrix multiplications:
https://github.com/hfp/libxsmm
My guess for the quickest way to success:
- Introduce a function that implements matrix-matrix-multiplication (just like in
sumfact.py
) - Make it backendized.
- Implement a backend based on libxsmm
Other questions to clarify:
- How to do the buildsystem bridging (libxsmm has several operating modes)?
- Can libxsmm do all the custom stride stuff that we need?