Bug in the nonoverlapping BiCGSTAB with SSORk preconditioner implementation in pdelab
I stumbled upon the nonoverlapping BiCGSTAB with SSORk preconditioner implementation in pdelab. (Dune::PDELab::ISTLBackend_NOVLP_BCGS_SSORk). I ran the "testnonoverlappingsinglephaseflow-yasp" example from pdelab/test directory and following results were produced.
Tested using dune.2.6.0 version. Similar observation using dune-pdelab latest git version.
On 1 core:
parallel run on 1 process(es)
=== matrix setup (max) 0.000339173 s
=== matrix assembly (max) 0.000315127 s
=== residual assembly (max) 0.000215186 s
=== solving (reduction: 1e-12) === BiCGSTABSolver ** 9 1.63848e-12**
=== rate=0.0411766,** T=0.000641356**, TIT=7.12618e-05, IT=9
0.000676894 s
on 4 cores :
parallel run on 4 process(es)
=== matrix setup (max) 0.000217795 s
=== matrix assembly (max) 0.000168486 s
=== residual assembly (max) 8.6993e-05 s
=== solving (reduction: 1e-12) === BiCGSTABSolver
** 16.5 4.51445e-12**
=== rate=0.186651, T=0.000843022, TIT=5.10922e-05, IT=16.5
0.00102991 s
Clearly we can see that on 4 cores it takes more CPU time than on single core (iteration count also increased to 16 !!!) . Seems that the bug is lying around SSORk preconditioner level.
Reason: passing consistent matrix to the SeqSSOR preconditoner and the same matrix to the BCGS solver leads more iteration count if running on more than 2 cores. Even passing consistent matrix to the SeqSSOR preconditioner and inconsistent matrix to BCGS solver takes also more CPU time and iteration count. I am suspecting that grid_operator.make_consistent is the culprit. But not sure.