Bugfix/vectorization strategy
Issue
Assume you have AVX-2. The vectorization strategy would sometimes create strategies looking like this: [1,2,2,0]. Here 1 and 2 represent different inout_keys and 0 is padding in the end. This means this vectorization strategy wants to merge 3 sumfactorization kernel and add padding in the end. Unfortunately we can't realize this vectorization strategy as the input in the first half of the SIMD lanes can't be realized with a broadcast and padding is only supported at the end.
Workaround
By reordering the kernels we can get the strategy [2,2,1,0] that can be realized without any problems.
Notes
- The reordering happens within
get_vectorization_dict
and doesn't affect the overall vectorization strategy algorithm. This only changes the values of the entries of the resulting vectorization dictionary - I checked, that it produces the same vectorization strategies for a Poisson and a Stokes problem.