New vectorization features
So far, only horizontal opportunities with identical input coefficient were considered. This merge request aims at more general vectorization opportunities. It involves:
-
More involved SIMD transpose implementations -
Extension of the vectorization generator -
Generation of lower/upper half loads (like Vec8d(Vec4d, Vec4d)
) -
Introduce abstraction of kernel output (just like input) -
Handle horizontal_add
s on SIMD halves -
Trim down variability of vectorization opportunities through heuristics -
Reimplement fromlist
strategy on top of the new style
For the time being, I will restrict myself to lower/upper half of the input SIMD vector being different, as I can imagine this particular case to be very useful in vectorizing skeleton integrals. Allowing too many opportunities will screw the opportunity generator complexity-wise I guess, but we will see....
Edited by Dominic Kempf