add openmp pragmas to loopsimd
This MR adds OpenMP pragmas in the class LoopSIMD
to enforce SIMD optimization. This helps in particular that the compiler uses FMA-instructions in expressions like
a += b*c
See: #223 (closed) and https://stackoverflow.com/questions/64682270/more-aggresive-optimization-for-fma-operations
On our Skylake machine this yields a speedup of up to 2 for Matrix vector multiplication:
g++-9 -O3 -march=native -fopenmp -mprefer-vector-width=512
without pragmas:
2021-09-24 09:19:03
Running ./simdmatapply
Run on (80 X 2401 MHz CPU s)
CPU Caches:
L1 Data 32K (x40)
L1 Instruction 32K (x40)
L2 Unified 1024K (x40)
L3 Unified 28160K (x2)
Load Average: 0.48, 16.85, 17.52
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
BM_MatApply<1> 56782 ns 56782 ns 12333
BM_MatApply<8> 212008 ns 212012 ns 3298
BM_MatApply<16> 402430 ns 402427 ns 1737
BM_MatApply<32> 666676 ns 666689 ns 1014
BM_MatApply<48> 936289 ns 936301 ns 722
BM_MatApply<64> 1180108 ns 1180123 ns 572
BM_MatApply<96> 1739868 ns 1739876 ns 389
BM_MatApply<128> 2319212 ns 2319225 ns 293
BM_MatApply<192> 4261231 ns 4261318 ns 120
with pragmas:
2021-09-24 09:18:20
Running ./simdmatapply
Run on (80 X 2401 MHz CPU s)
CPU Caches:
L1 Data 32K (x40)
L1 Instruction 32K (x40)
L2 Unified 1024K (x40)
L3 Unified 28160K (x2)
Load Average: 0.69, 19.55, 18.38
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
BM_MatApply<1> 56521 ns 56521 ns 12370
BM_MatApply<8> 88028 ns 88030 ns 7921
BM_MatApply<16> 134909 ns 134910 ns 5146
BM_MatApply<32> 283983 ns 283984 ns 2449
BM_MatApply<48> 496748 ns 496753 ns 1304
BM_MatApply<64> 665739 ns 665752 ns 981
BM_MatApply<96> 894130 ns 894148 ns 727
BM_MatApply<128> 1170710 ns 1170611 ns 555
BM_MatApply<192> 3690765 ns 3690801 ns 188
Merge request reports
Activity
- Resolved by Christoph Grüninger
Can you elaborate, how this is related to
std::simd
/ VcDevel? Is it a replacement, an alternative? Which one trust we more to be future-save? I don't think we should support both, at least not in the long run.
mentioned in issue #260 (closed)
mentioned in commit b692bb87
This leads to tons of warnings downstream. Maybe we can disable these warnings through compiler directives in the code?
/duneci/modules/dune-common/dune/common/simd/loop.hh:189: warning: ignoring #pragma omp simd [-Wunknown-pragmas] /duneci/modules/dune-common/dune/common/simd/loop.hh:189: warning: ignoring #pragma omp simd [-Wunknown-pragmas] ...
This leads to tons of warnings downstream. Maybe we can disable these warnings through compiler directives in the code?
/duneci/modules/dune-common/dune/common/simd/loop.hh:189: warning: ignoring #pragma omp simd [-Wunknown-pragmas] /duneci/modules/dune-common/dune/common/simd/loop.hh:189: warning: ignoring #pragma omp simd [-Wunknown-pragmas] ...
It is the concept of pragmas, that the compiler is allowed to ignore them.
Apparently you have explicitly enabled
-Wunknown-pragmas
. Do we really have to work around this, or should the downstream module adjust its flags?I assume you are using
-Wall
? In that case I'd suggest to add-Wno-unknown-pragmas
after the-Wall
.
mentioned in merge request !1035 (merged)
mentioned in issue #223 (closed)