Enabling vectorization for arbitrary number of blocks
This enables vectorization for arbitrary number of blocks by manually handling the tail of the vectorized loop. The tail can also be vectorized, if a smaller vector length fits.
Another approach could be to use padding, but I'm currently not sure how to do that.
Edited by Marcel Koch