Skip to content

Performance transformation: Loop reordering in sumfact kernel

René Heß requested to merge feature/sumfact-loop-reordering into master

Performance transformation through loop nest reordering. There are two ways to reorder loops in a tensor contraction:

  1. Directly accumulate in output variable after setting to zero
  2. Accumulating in a large enough temporary

This merge request implements these ways of loop reordering and the possibility to create an autotune target directly from the loopy kernel.

Edited by René Heß

Merge request reports