Making the parallel interface more generic causes segfaults
As discussed by emails, I tried to parameterized the ProxyDataHandle class by a pybind11::object
to be more generic in the parallel interface but that ended up with many segfaults when running the parallel finite volume scheme with both YaspGrid (see log-generic-parallel-interface-yaspgrid.txt) and ALUGrid (see log-generic-parallel-interface-alugrid.txt).
You can test it by moving on the branch associated to that change: https://gitlab.dune-project.org/michael.sghaier/dune-corepy/tree/generic_parallel_interface (look especially commit 53612e23).
I did some debugging with the case of an ALUGrid and found the causes of the two first segfaults:
-
the first one was caused by a call to
static void copy ( void *dest, const T *src, std::size_t n)
line 444 ofdune-alugrid/dune/alugrid/impl/serial/serialize.h
, invoked byinline void writeT (const T & a, const bool checkLength )
line 133. Why? Because in that functioncopy
, there is this linestatic_cast< T * >( dest )[ i ] = src[ i ]
. WhenT
ispybind11::object
, that call tooperator=
induces a call topybind11::handle::dec_ref()
that segfaults. And why? Becausedest
is actually a buffer that was allocated with malloc but never initialized with a call tonew
(see http://paste.awesom.eu/Piig inserialize.h
) and thusstatic_cast< T* >( dest )[ i ]
refers to an uninitialized part of memory instead of apybind11::object
. -
the second one is the same thing with
inline void readT (T& a, bool checkLength )
line 156 that caused a segfault in a call topybind11::handle::inc_ref()
for the same reasons.
I fixed these two segfaults by modifying these writeT
and readT
functions to initialize the allocated buffers, see http://paste.awesom.eu/Eg7T . After recompiling dune-alugrid and dune-corepy, these segfaults disappeared. Unfortunately, there is an other segfault that appeared and I didn't manage to fix it, so the issue remains.