Skip to content
Snippets Groups Projects

add openmp pragmas to loopsimd

Merged Nils-Arne Dreier requested to merge loopsimd_add_openmp_pragmas into master
1 unresolved thread

This MR adds OpenMP pragmas in the class LoopSIMD to enforce SIMD optimization. This helps in particular that the compiler uses FMA-instructions in expressions like

a += b*c

See: #223 (closed) and https://stackoverflow.com/questions/64682270/more-aggresive-optimization-for-fma-operations

On our Skylake machine this yields a speedup of up to 2 for Matrix vector multiplication:

g++-9 -O3 -march=native -fopenmp -mprefer-vector-width=512

without pragmas:

2021-09-24 09:19:03
Running ./simdmatapply
Run on (80 X 2401 MHz CPU s)
CPU Caches:
  L1 Data 32K (x40)
  L1 Instruction 32K (x40)
  L2 Unified 1024K (x40)
  L3 Unified 28160K (x2)
Load Average: 0.48, 16.85, 17.52
-----------------------------------------------------------
Benchmark                 Time             CPU   Iterations
-----------------------------------------------------------
BM_MatApply<1>        56782 ns        56782 ns        12333
BM_MatApply<8>       212008 ns       212012 ns         3298
BM_MatApply<16>      402430 ns       402427 ns         1737
BM_MatApply<32>      666676 ns       666689 ns         1014
BM_MatApply<48>      936289 ns       936301 ns          722
BM_MatApply<64>     1180108 ns      1180123 ns          572
BM_MatApply<96>     1739868 ns      1739876 ns          389
BM_MatApply<128>    2319212 ns      2319225 ns          293
BM_MatApply<192>    4261231 ns      4261318 ns          120


with pragmas:

2021-09-24 09:18:20
Running ./simdmatapply
Run on (80 X 2401 MHz CPU s)
CPU Caches:
  L1 Data 32K (x40)
  L1 Instruction 32K (x40)
  L2 Unified 1024K (x40)
  L3 Unified 28160K (x2)
Load Average: 0.69, 19.55, 18.38
-----------------------------------------------------------
Benchmark                 Time             CPU   Iterations
-----------------------------------------------------------
BM_MatApply<1>        56521 ns        56521 ns        12370
BM_MatApply<8>        88028 ns        88030 ns         7921
BM_MatApply<16>      134909 ns       134910 ns         5146
BM_MatApply<32>      283983 ns       283984 ns         2449
BM_MatApply<48>      496748 ns       496753 ns         1304
BM_MatApply<64>      665739 ns       665752 ns          981
BM_MatApply<96>      894130 ns       894148 ns          727
BM_MatApply<128>    1170710 ns      1170611 ns          555
BM_MatApply<192>    3690765 ns      3690801 ns          188

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Any compiler not supporting OPM should simply ignore the statements, in the other cases it offers the chance for better optimization. I don't see any reason not to merge.

  • Nils-Arne Dreier resolved all threads

    resolved all threads

  • mentioned in issue #260 (closed)

  • Until we decide to drop certain implementations, we should try to keep the current ones in their best shape. Thus I'll merge this MR. The changes are so minimal that they really don't harm and don't increase the maintenance burden.

  • mentioned in commit b692bb87

    • This leads to tons of warnings downstream. Maybe we can disable these warnings through compiler directives in the code?

      /duneci/modules/dune-common/dune/common/simd/loop.hh:189: warning: ignoring #pragma omp simd [-Wunknown-pragmas]
      /duneci/modules/dune-common/dune/common/simd/loop.hh:189: warning: ignoring #pragma omp simd [-Wunknown-pragmas]
      ...
    • This leads to tons of warnings downstream. Maybe we can disable these warnings through compiler directives in the code?

      /duneci/modules/dune-common/dune/common/simd/loop.hh:189: warning: ignoring #pragma omp simd [-Wunknown-pragmas]
      /duneci/modules/dune-common/dune/common/simd/loop.hh:189: warning: ignoring #pragma omp simd [-Wunknown-pragmas]
      ...

      It is the concept of pragmas, that the compiler is allowed to ignore them.

      Apparently you have explicitly enabled -Wunknown-pragmas. Do we really have to work around this, or should the downstream module adjust its flags?

      I assume you are using -Wall? In that case I'd suggest to add -Wno-unknown-pragmas after the -Wall.

    • I'm torn. I don't like that we have cases where -Wall renders useless because Dune code lets compiler warnings spill. The workaround is easy enough. Probably I don't like that -Wunknown-pragmas is part of -Wall.

    • Please register or sign in to reply
  • Simon Praetorius mentioned in merge request !1035 (merged)

    mentioned in merge request !1035 (merged)

  • mentioned in issue #223 (closed)

Please register or sign in to reply
Loading