Vectorization with a cost model
There is many more things to be done here:
The flopcost model from permutation must be flop-exact -
The permutation must be taken into account when determining flop cost of a kernel -
Hardware parameters (peak performance, memory bandwidth) must be read through ini file -
Collect ideas for ILP heuristic -
Do not store quadrature point size with the tabulations - it may be altered globally -
write generator for all vectorization opportunities -
Rewrite old vectorization strategies in terms of a cost function -
Do meaningful stringification of vectorization strategy dictionaries -
Rethink options interface for vectorization
Edited by Dominic Kempf