Think about multiple loopy kernels
At the moment it is not possible to generate multiple loopy kernels at the same time. We need to add this capability if we want to use loopy for filing the theta matrices used for sumfactorization.
We could also generate those matrices before we do anything else in the code. But this could also lead to some strange bugs if we call a function that creates stuff for the main loopy kernel.