Eigenvalue tests are funky

changed the description

I talked briefly to @sebastian.westerheide about this. He's currently occupied otherwise, so he can't take care of this at the moment, but he was able to give the following comments:

The tests were mostly derived from code from code Sebastian uses elsewhere, and meant as an example how the Eigenvalue code can be used. For use as unit tests they don't have to operate on 60x60 or 40x40 matrices as they do now, 10x10 should be sufficient.
They unit tests just compute, but do not check the condition numbers.

Here is one failure:

-*- mode: compilation; default-directory: "~/Projekte/dune-deprecations/dune-istl/build-cmake/dune/istl/eigenvalue/test/" -*-
Compilation started at Thu Apr 19 12:11:13

./arpackppsuperlutest 
testing for N = 60, BS = 1
    MatrixInfo: Computing 2-norm condition number (assuming that matrix is symmetric).
    ArPackPlusPlus_Algorithms: Computing an approximation of the dominant eigenvalue of a matrix which is assumed to be symmetric.
                               Obtained eigenvalues of A by solving A*x = λ*x using the ARPACK++ class ARSymStdEig:
                                      converged eigenvalues of A: 1 / 1
                                        dominant eigenvalue of A: 7.9947
                               Result (#iterations = 35, ║residual║_2 = 2.81231e-14): λ = 7.9947
    PowerIteration_Algorithms: Performing TLIME iteration for estimated eigenvalue in the interval (0,0).
ERROR: Dune::ISTLError [applyTLIMEIteration:/home/joe/Projekte/dune-deprecations/dune-istl/dune/istl/eigenvalue/test/../poweriteration.hh:733]: TLIME iteration did not converge in 20000 iterations (║residual║_2 = 3.58816e-10, epsilon = 1e-11).

Compilation exited abnormally with code 1 at Thu Apr 19 12:17:21

It appears to be connected to superlu, not arpack as I originally wrote, at least I wasn't able to trigger it with two runs of arpackpptest.

changed the description

Here more on the exact failure condition:

It seems to fail in the TLIME iteration, but only when using superlu and a 60x60 matrix. It down not fail without superlu, or with a 40x40 matrix.

changed the description

OK, I just did not understand how including libraries in the tests worked, so that point was bogus and the tests do indeed link all the required libraries, and only the required libraries.

The combined arpackppsuperlutest uses dune_add_test directly to define the cmake target:

  dune_add_test(NAME arpackppsuperlutest
                SOURCES cond2test.cc)

This has the effect of add_dune_all_flags(), so both ArPackPP and SuperLU are available.

The other tests do something like

  add_executable(arpackpptest cond2test.cc)
  add_dune_arpackpp_flags(arpackpptest)
  target_link_libraries(arpackpptest dunecommon)
  dune_add_test(TARGET arpackpptest)

By setting up the target outside of dune_add_test(), add_dune_all_flags() isn't implied, and the test can skip linking to SuperLU (and ENABLE_SUPERLU stays unset).

mentioned in issue #47

@sebastian.westerheide can now take care of this.

assigned to @sebastian.westerheide

I now found time to look into the issue.

@joe is absolutely right in saying that the problem which he encountered is due to the TLIME algorithm, which is used to compute the smallest magnitude eigenvalue / smallest singular value of the test matrix. On his machine, the residual does not fall below the given threshold value epsilon = 1e-11 within the given maximum of 20000 iterations, when using SuperLU as a linear solver.

Unfortunately, I was not able to reproduce this behavior on my Ubuntu 14.04 LTS machine.
Using the current master branch versions of

dune-common (dune-common@88c6b64b)
dune-istl (848fc915)

together with

libsuperlu-dev (Version: 4.3+dfsg-3) and
libarpack2 (Version: 3.1.5-3)

I get:

./arpackppsuperlutest
testing for N = 60, BS = 1
    MatrixInfo: Computing 2-norm condition number (assuming that matrix is symmetric).
    ArPackPlusPlus_Algorithms: Computing an approximation of the dominant eigenvalue of a matrix which is assumed to be symmetric.
                               Obtained eigenvalues of A by solving A*x = λ*x using the ARPACK++ class ARSymStdEig:
                                      converged eigenvalues of A: 1 / 1
                                        dominant eigenvalue of A: 7.9947
                               Result (#iterations = 35, ║residual║_2 = 1.83633e-14): λ = 7.9947
    PowerIteration_Algorithms: Performing TLIME iteration for estimated eigenvalue in the interval (0,0).
                               Interval (0,0) is free of eigenvalues, approximating the closest eigenvalue.
                               Result (#iterations = 5, ║residual║_2 = 9.22821e-12): λ = 0.00530364  
    Largest magnitude eigenvalue λ_max = 7.9947
    Smallest magnitude eigenvalue λ_min = 0.00530364
    2-norm condition number cond_2 = 1507.4
computation of condition number took 1.09179 seconds
    MatrixInfo: Computing 2-norm condition number.
    ArPackPlusPlus_Algorithms: Computing an approximation of the largest singular value of a matrix which is assumed to be nonsymmetric.
                               Obtained singular values of A by solving (A^T*A)*x = σ²*x using the ARPACK++ class ARSymStdEig:
                                  converged eigenvalues of A^T*A: 1 / 1
                                     largest eigenvalue of A^T*A: 63.9152
                                  => largest singular value of A: 7.9947
                               Result (#iterations = 22, ║residual║_2 = 2.6779e-13): σ = 7.9947
    PowerIteration_Algorithms: Performing TLIME iteration for estimated eigenvalue in the interval (0,0).
                               Interval (0,0) is free of eigenvalues, approximating the closest eigenvalue.
                               Result (#iterations = 4, ║residual║_2 = 2.13295e-12): λ = 2.81286e-05
    Largest singular value σ_max = 7.9947
    Smallest singular value σ_min = 0.00530364
    2-norm condition number cond_2 = 1507.4
computation of condition number took 1.31215 seconds

As @joe already mentioned, the unit tests do not have to operate on 60x60 or 40x40 matrices, as they do now. Therefore, one could handle the issue by just using something like a 10x10 matrix.

But since arpackppsuperlutest does not fail on my machine and takes less than 3 seconds, even though operating on the 60x60 matrix and compiling without optimizations (I compiled using -O0), it would rather be interesting to know why the SuperLU-driven TLIME algorithm does not converge on @joe's machine.

I furthermore used docker to check whether the positive outcome of the test on my machine is related to the specific library versions which I am using in Ubuntu 14.04 LTS. The answer is no.

In Debian Stretch with

libsuperlu-dev (Version: 5.2.1+dfsg1-2) and
libarpack2 (3.4.0-1+b1)

the test also succeeds and I get:

./arpackppsuperlutest
testing for N = 60, BS = 1
    MatrixInfo: Computing 2-norm condition number (assuming that matrix is symmetric).
    ArPackPlusPlus_Algorithms: Computing an approximation of the dominant eigenvalue of a matrix which is assumed to be symmetric.
                               Obtained eigenvalues of A by solving A*x = λ*x using the ARPACK++ class ARSymStdEig:
                                      converged eigenvalues of A: 1 / 1
                                        dominant eigenvalue of A: 7.9947
                               Result (#iterations = 35, ║residual║_2 = 1.83633e-14): λ = 7.9947
    PowerIteration_Algorithms: Performing TLIME iteration for estimated eigenvalue in the interval (0,0).
                               Interval (0,0) is free of eigenvalues, approximating the closest eigenvalue.
                               Result (#iterations = 6, ║residual║_2 = 4.10771e-15): λ = 0.00530364
    Largest magnitude eigenvalue λ_max = 7.9947
    Smallest magnitude eigenvalue λ_min = 0.00530364
    2-norm condition number cond_2 = 1507.4
computation of condition number took 1.11077 seconds
    MatrixInfo: Computing 2-norm condition number.
    ArPackPlusPlus_Algorithms: Computing an approximation of the largest singular value of a matrix which is assumed to be nonsymmetric.
                               Obtained singular values of A by solving (A^T*A)*x = σ²*x using the ARPACK++ class ARSymStdEig:
                                  converged eigenvalues of A^T*A: 1 / 1
                                     largest eigenvalue of A^T*A: 63.9152
                                  => largest singular value of A: 7.9947
                               Result (#iterations = 22, ║residual║_2 = 2.6779e-13): σ = 7.9947
    PowerIteration_Algorithms: Performing TLIME iteration for estimated eigenvalue in the interval (0,0).
                               Interval (0,0) is free of eigenvalues, approximating the closest eigenvalue.
                               Result (#iterations = 4, ║residual║_2 = 1.54212e-12): λ = 2.81286e-05
    Largest singular value σ_max = 7.9947
    Smallest singular value σ_min = 0.00530364
    2-norm condition number cond_2 = 1507.4
computation of condition number took 1.45375 seconds

As you can see, the newer version of libsuperlu-dev yields the same smallest magnitude eigenvalue / smallest singular value on my machine as the older version which I used in the above comment. Nevertheless, it should be noted that both SuperLU versions result in different residuals / iteration counts for the TLIME algorithm.

One could argue that @joe probably uses yet another version of libsuperlu-dev for which arpackppsuperlutest fails. Unfortunately, arpackppsuperlutest fails on @joe's machine using Debian Stretch with exactly the same versions of libsuperlu-dev (and libarpack2).

Ideas, anyone?

I do see the problems in an environment with Vc installed and -march=native, which enables all kinds of CPU features in the compiler. Also, there is one warning about use of uninitialized values, although that seems not directly related (in matrixinfo.hh). I'm gonna try to make a Dockerfile with an exact reproducer.

Here is a reproducer: prepare a directory, into which you checkout dune-common and dune-istl. Then put in the following two files: Dockerfile, opts. Run docker build . in that directory.

The final command runs arpacksuperlutest with a timeout of 10 seconds. If the test succeeds it will complete in about 2 seconds.

It looks as if the package libopenblas-dev is installed, the test will be linked to both

        libopenblas.so.0 => /usr/lib/libopenblas.so.0 (0x00007f2d52304000)

and

	libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f2d509b0000)

Both implement blas, but probably in a mutually binary-incompatible way. So the question is: why are both linked?

It could be our build system preferring openblas over blas, and lapack pulling in libblas directly. Anyway, that's something for another day.

assigned to @joe and unassigned @sebastian.westerheide

The new CI image with GCC 8 and -DNDEBUG also hits the timeout: https://gitlab.dune-project.org/core/dune-istl/-/jobs/42887

@sebastian.westerheide just helped me interpret the output in that log. The error message seems to appear multiple times: the first time presumably directly (though the message does not indicate an error, it's just the normal program output before it is aborted due to timeout). The second one seems to be passed through some parts of the build system that hiccup on the unicode λ that appears in the fourth line.

I should probably document the outline for a solution I had before getting distracted by more urgent things:

I think libdunecommon links to blas only so some tests can use it. If that is indeed the case, we might get away with liking those tests directly to blas, and maybe inlining the necessary stuff from the library instead.

Yes, I noticed the test output thing as well, @ansgar has already fixed it in dune-common!535 (merged).

More details on the "only-used-in-tests"-thingy:

AFAICT blas is not used directly inside dune-common (assuming the header contains "blas" somewhere in it's name). It is pulled in by:

Lapack (DuneCommonMacros.cmake, findpackage(LAPACK), not provided by dune, seems to implicitly look for blas)
SuiteSparse (FindSuiteSparse.cmake, provided by dune, calls findpacke(BLAS))
UMFPack (FindUMFPack.cmake, provided by dune, calls findpackage(SuiteSparse) and includes ${BLAS_LIBRARIES} in UMFPACK_DUNE_LIBRARIES)
dune/common/CMakeLists.txt includes the lapack libraries, if found, else the blas libraries, if found, in libdunecommon.

If lapack was found, dynmatrixev.cc (which is compiled into libdunecommon), declares DGEEV_FORTRAN() and uses it in Dune::DynamicMatrixHelp::eigenValuesNonsymLapackCall(). If lapack was not found, eigenValuesNonsymLapackCall() is a noop.

Actually, DGEEV_FORTRAN is a macro, which expands to the fortran-mangling of dgeev.

fmatrixev.cc: same as above for dsyev -> Dune::FMatrixHelp::eigenValuesLapackCall() and dgeev -> Dune::FMatrixHelp::eigenValuesNonsymLapackCall()

eigenvaluetest.cc checks the rosser matrix using lapack, but skips that with a warning if lapack is not found.

Dune::DynamicMatrixHelp::eigenValuesNonsymLapackCall() is only used by ~~DynamicMatrixHelp::eigenValuesNonsymLapackCall()~~ DynamicMatrixHelp::eigenValuesNonSym() (from dynmatrixev.hh), which in turn is only used by the rosser matrix test in eigenvaluetest.cc

Similar

FMatrixHelp::eigenValuesLapackCall() only used by FMatrixHelp::eigenValues<dim, K>() (fmatrixev.hh). There are non-lapack-using specializations for dim <= 3.
FMatrixHelp::eigenValuesNonsymLapackCall() only used by FMatrixHelp::eigenValuesNonSym() (fmatrixev.hh)

FMatrixHelp::eigenValues() is used by eigenvaluetest.cc but only for dim==2 and dim==3, so without using lapack. It is also used by fmatrixtest.cc, in it's lapack incarnation, to test the rosser matrix (though the test function there is called test_ev()).

TODO: track uses FMatrixHelp::eigenValuesNonSym(), UMFPack and SuiteSparse.

FMatrixHelp::eigenValuesNonSym() is never used in dune-common.

All the other core modules (+ extension modules, + staging modules) do not use anything from a namespace *MatrixHelp, with the following exceptions:

dune/geometry/test/checkgeometry.hh uses FMatrixHelp::multMatrix()
dune/grid/albertagrid/algebra.hh uses FMatrixHelp::invertMatrix()
dune/istl/btdmatrix.hh uses FMatrixHelp::multMatrix()
dune/alugrid/3d/mappings_imp.cc uses FMatrixHelp::invertMatrix()

multMatrix() and invertMatrix() are implemented in fmatrix.hh, not in fmatrixev.*, so if we do things to fmatrixev.* that should not interfere with those uses.

(Side note: I believe those uses are hacks, and should actually be converted to the official interface -- e.g. invertMatrix() is only implemented for matrices up to 3×3. But that is another issue.)

FindUMFPack.cmake is just a deprecated wrapper around FindSuiteSparse.cmake. Neither is used in dune-common.

Otherwise (core, extenstion, and staging modules), blas appears in:

ISTL: cmake/modules/FindSuperLU.cmake: find_package(BLAS QUIET) + inclusion of the detected flags in the SuperLU-Flags, and istl looks for SuperLU by default.

Besides istl, SuperLU is used in PDELab. It is also used in dune-tpmc, but that cannot have worked properly since dune-tpmc does not depend on istl.

lapack isn't used anywhere beside dune-common in {core, extension, staging}

ISTL seems to be the only module calling findpackage(SuiteSparse). No modules seems to call findpackage(UMFPack).

Besides ISTL, the only other module that seems to use SuiteSparse/UMFPack is PDELab, and that relies on ISTL for configure-time detection.

Wait a minute -- I was assuming the rosser matrix tests test some dense matrix Dune functionality using the eigenvalues. But all the Dune-functionality they test is actually the wrapper around the lapack functions that compute the eigenvalues. Had my assumption been correct we could have moved the link-dependency out of libdunecommon and into the tests, but now I'm not so sure. Either someone is using the eigenvalue functions from the *MatrixHelp namespaces, or they can be removed completely, along with their tests.

Asked on the Dune-List if anyone is using those functions. Gave it one week (until 2018-07-09) for responses. If no-one objects, I'm going to assume Help == Impl and will remove the eigenvalue functions from *matrixev.hh without deprecation.

I know about user-code involving Dune::FMatrixHelp::eigenValuesLapackCall.

Talked with Carsten, he knows about one specific case of someone using eigenValuesLapackCall() and thinks there are other cases of people using the eigenValue() functions.

The difference between the *Call() functions and the others is that all the *Call() function do is figuring out the fortran-mangled name of the function to call, while the other functions convert the data to a format expected by the fortran functions (and then invoke the *Call() functions with that data).

In the case above, the user uses eigenValuesLapackCall() instead of the data-format-conversion-wrapper because he needs the eigenvectors in addition to the eigenvalues.

Looks like the best option is to keep the data-format-conversion wrappers, and get rid of the *Call() wrappers. As a replacement, provide the Fortran-mangling functionality to users, so they can call whatever lapack functions they please. Users will have to add findpackage(LAPACK) to their projects to use this functionality, and they'll need to add the lapack flags, either globally in their project or to each target.

I'll prepare an MR for this.

Looks like I didn't dig deep enough:

joe@paranoia:~$ ls -l /usr/lib/libblas.so.3
lrwxrwxrwx 1 root root 30 Jun 26  2012 /usr/lib/libblas.so.3 -> /etc/alternatives/libblas.so.3
joe@paranoia:~$ ls -l /etc/alternatives/libblas.so.3
lrwxrwxrwx 1 root root 35 Apr 14  2015 /etc/alternatives/libblas.so.3 -> /usr/lib/openblas-base/libblas.so.3
joe@paranoia:~$ ls -l /usr/lib/openblas-base/libblas.so.3
-rw-r--r-- 1 root root 383248 Mai  6  2017 /usr/lib/openblas-base/libblas.so.3
joe@paranoia:~$ objdump -p /usr/lib/openblas-base/libblas.so.3 | grep NEEDED
  NEEDED               libopenblas.so.0
  NEEDED               libm.so.6
  NEEDED               libpthread.so.0
  NEEDED               libgfortran.so.3
  NEEDED               libc.so.6
joe@paranoia:~$

So it's actually perfectly normal that both libblas and libopenblas are linked, because the former appears to be a wrapper library around the latter. Dammit.

So, that leaves us with two possibilities: (1) the blas library provided by openblas is broken and (2) undefined behavior somewhere in the test program.

I tried to reproduce using the reproducer from a month ago.

Unreproducible on epic (AMD EPYC 7501).
Reproducible on scratches (i7-6600U, Skylake) and sky (Xeon(R) Gold 6148, Skylake).

On sky, it is also reproducible without -march=native, so it probably isn't the compiler. Maybe some sort for target-specific routine in openblas?

compiled a docker image containing arpackppsuperlutest on sky, without -march=native
transferred that image to epic
run arpacksuperlutest on epic
test completes successfully within tree seconds
the very same test with the very same image when run on sky times out

mentioned in commit 0bb1e69d

mentioned in merge request !221 (merged)

closed via merge request !221 (merged)

mentioned in commit 9da2de98

OK, and this also works with the new ci setup on skylake, https://gitlab.dune-project.org/core/dune-istl/-/jobs/44839

mentioned in merge request !220 (closed)

mentioned in issue #61 (closed)

mentioned in commit ae21be67

mentioned in merge request !262 (merged)

mentioned in commit 9698e497

Eigenvalue tests are funky

Designs

Child items ...

Activity