Making the parallel interface more generic causes segfaults

As discussed by emails, I tried to parameterized the ProxyDataHandle class by a pybind11::object to be more generic in the parallel interface but that ended up with many segfaults when running the parallel finite volume scheme with both YaspGrid (see log-generic-parallel-interface-yaspgrid.txt) and ALUGrid (see log-generic-parallel-interface-alugrid.txt).

You can test it by moving on the branch associated to that change: https://gitlab.dune-project.org/michael.sghaier/dune-corepy/tree/generic_parallel_interface (look especially commit 53612e23).

I did some debugging with the case of an ALUGrid and found the causes of the two first segfaults:

  • the first one was caused by a call to static void copy ( void *dest, const T *src, std::size_t n) line 444 of dune-alugrid/dune/alugrid/impl/serial/serialize.h, invoked by inline void writeT (const T & a, const bool checkLength ) line 133. Why? Because in that function copy, there is this line static_cast< T * >( dest )[ i ] = src[ i ]. When T is pybind11::object, that call to operator= induces a call to pybind11::handle::dec_ref() that segfaults. And why? Because dest is actually a buffer that was allocated with malloc but never initialized with a call to new (see http://paste.awesom.eu/Piig in serialize.h) and thus static_cast< T* >( dest )[ i ] refers to an uninitialized part of memory instead of a pybind11::object.

  • the second one is the same thing with inline void readT (T& a, bool checkLength ) line 156 that caused a segfault in a call to pybind11::handle::inc_ref() for the same reasons.

I fixed these two segfaults by modifying these writeT and readT functions to initialize the allocated buffers, see http://paste.awesom.eu/Eg7T . After recompiling dune-alugrid and dune-corepy, these segfaults disappeared. Unfortunately, there is an other segfault that appeared and I didn't manage to fix it, so the issue remains.