Just in Time-compiled LocalOperator
I am currently exploring the use of JIT-compilation as an alterative to our CMake-based code generation workflow. The idea would be to use an embedded Python interpreter that reads the UFL file, creates a loopy kernel from it and then just in time compiles it into assembly that can directly be executed.
My current idea is to integrate this toolchain into a class JITCompiledLocalOperator
that fulfills the PDELab LocalOperator interface. This operator can then be used from any PDELab application.
For the technical realization of JIT compilation there is two alternatives:
- Using Numba.
- Pro: No installation issue:
numba
ispip
-installable and links againstlibllvm
-> No manual compiler handling involved - Pro: The execution part is maintained completely elsewhere
- Pro/Contra (You choose): This will only work with vanilla loopy kernels without any Duneisms inside
- Contra: The
NumbaTarget
needs to be adjusted for use from C++, see https://github.com/inducer/loopy/issues/118
- Pro: No installation issue:
- Using the parts of loopy for execution of C code
- Pro: The loopy kernels may involve more C constructs
- Contra: More moving parts
- Contra: More work (I guess)
How these would perform, I cannot judge yet. My preliminary tests with the current NumbaTarget
show results that are on par with PDELab's local operator (convection-diffusion being the example) as soon as the kernel workload is sufficiently high (the NumbaTarget
suffers from an extremely high function call overhead due to a detour to Python being taken - a remedy is proposed in above GitHub issue).
In my Numba-based implementation, I was adding an additional backend (aka a set of mixins) to dune-codegen that generates a pure loopy Kernel without any Dune-specific stuff. Basis function evaluations, Quadrature Rules, Degree of Freedom data and accumulation objects are arguments of this kernel. The assembly methods of the JIT operator does extract that data from Dune data structures and passes them into the JIT-compiled kernel. For most parts, expensive copies can be avoided.
I will later move my experimental code from its repository to this one.
I am opening this issue to let everybody know that I am experimenting with this. If you have any feedback let me know. I am on leave until mid of august though, don't expect me to answer before that time.