Skip to content

[!221] [TLIME] Relax convergence criterion.

Jö Fahlke requested to merge cherry-pick-9da2de98 into releases/2.6

Merge branch 'fix-tlime-residual-limit' into 'master'

The previous convergence limit was originally determined experimentally as 1e-11. This worked for many blas implementations and architectures. However, when used with openblas on skylake, apparently the residual norm would not go below ~1e-10, so convergence was never achieved. In fact, even on non-skylake the residual norm would go above 1e-11 again after briefly dipping below, if iterating further.

We believe that this is due to openblas selecting -- at runtime -- some skylake specific algorithm leading to a different ordering of operations, in turn leading to differences in numerical cancellation. We have however not verified this conclusively, nor have we identified precisely which blas algorithm is causing this.

This patch raises the convergence limit to sqrt(numeric_limits<field_type>::epsilon()). This limit has no theoretical justification -- it was selected because it usually works as a convergence limit for other (completely unrelated) algorithms, and because it works for both Skylake and other architectures (AMD Epyc) in this particular case.

Developed together with Sebastian Westerheide.

Fixes: #48.

See merge request !221 (merged)

(cherry picked from commit 9da2de98)

0bb1e69d [TLIME] Relax convergence criterion.

Closes: #61 (closed) (on 2.6).

Edited by Jö Fahlke

Merge request reports