## [bug] Solving in parallel, ISTLBackend_OVLP_GMRES_ILU0 leads to segmentation fault, if number of DOF is high.

**Hardware:** Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz (20 CPUs)

**OS:** CentOS Linux release 7.6.1810 (Core)

**Compiler:** gcc-6 (GCC) 6.3.0

**Dune modules are used:** codegen, common, functions, geometry, grid, istl, localfunctions, pdelad, testtools, typetree, uggrid

**Version:** The problem was found at the end of April. At the moment, I used the latest master commits (by dunecontrol git pull)

**External software:** ParMETIS (for domain decomposition)

# Summary

We are solving a problem in parallel. When we use `BCGS`

linear solver we don't have the bug. If we switch the solver to GMRES and have all the rest the same we get first "`IF_FUNCNAME: receive-timeout for IF 6`

" then in a couple of seconds we get a segmentation fault. The bug depends on the number of DOFs.

# Description

**1. Steps to reproduce**

*I can give you access to our private dune-richards project with the branch, where the bug occurs*.
In the project, we solve a 3D problem by DUNE-PDELab on an unstructured grid of 4866* **N** cubic elements with domain decomposition on 20 CPUs. The problem is nonlinear, so we use Newton method. And we solve it with 4th order DG (cubic functions), which means 4^3 DOFs per each grid cell.

**2. Expected behavior**

With different **N**, we expect Newton interations are running and then it either converges or not.
We get the expected behavior if we use `Dune::PDELab::ISTLBackend_OVLP_BCGS_ILU0`

linear solver with **N** from **2** to **5**. And we get the expected behavior with `Dune::PDELab::ISTLBackend_OVLP_GMRES_ILU0`

but only with **N=2**.

**3. Actual behavior**

With higher **N**, `Dune::PDELab::ISTLBackend_OVLP_GMRES_ILU0`

leads to a segmentation fault. The higher **N** is, the earlier in computational run time the problem occurs. If **N=5**, the bug happens on the first step:

TIME STEP [Alexander (claims order 3)] 1 time (from): 0.0000e+00 dt: 5.0000e-01 time (to): 5.0000e-01

STAGE 1 time (to): 2.1793e-01.

Initial defect: 2.4025e+00

Then it calculates for some time and leads to the error message:

IF_FUNCNAME: receive-timeout for IF 6

waiting for message (from proc 17, size 25600)

waiting for message (from proc 15, size 104960)

Then in a couple of seconds, it leads to the long error message, which you can find in the attachments: log.err

Then in a couple of seconds, it leads to a segmentation fault

mpirun noticed that process rank 16 with PID 134734 on node gauss2 exited on signal 11 (Segmentation fault).

**It seems like** the linear solver has a timeout counter inside for some reason. But the timeout can be too short for numerically expensive problems.