Making the parallel interface more generic causes segfaults
As discussed by emails, I tried to parameterized the ProxyDataHandle class by a pybind11::object to be more generic in the parallel interface but that ended up with many segfaults when running the parallel finite volume scheme with both YaspGrid (see log-generic-parallel-interface-yaspgrid.txt) and ALUGrid (see log-generic-parallel-interface-alugrid.txt).
You can test it by moving on the branch associated to that change: https://gitlab.dune-project.org/michael.sghaier/dune-corepy/tree/generic_parallel_interface (look especially commit 53612e23).
I did some debugging with the case of an ALUGrid and found the causes of the two first segfaults:
-
the first one was caused by a call to
static void copy ( void *dest, const T *src, std::size_t n)line 444 ofdune-alugrid/dune/alugrid/impl/serial/serialize.h, invoked byinline void writeT (const T & a, const bool checkLength )line 133. Why? Because in that functioncopy, there is this linestatic_cast< T * >( dest )[ i ] = src[ i ]. WhenTispybind11::object, that call tooperator=induces a call topybind11::handle::dec_ref()that segfaults. And why? Becausedestis actually a buffer that was allocated with malloc but never initialized with a call tonew(see http://paste.awesom.eu/Piig inserialize.h) and thusstatic_cast< T* >( dest )[ i ]refers to an uninitialized part of memory instead of apybind11::object. -
the second one is the same thing with
inline void readT (T& a, bool checkLength )line 156 that caused a segfault in a call topybind11::handle::inc_ref()for the same reasons.
I fixed these two segfaults by modifying these writeT and readT functions to initialize the allocated buffers, see http://paste.awesom.eu/Eg7T . After recompiling dune-alugrid and dune-corepy, these segfaults disappeared. Unfortunately, there is an other segfault that appeared and I didn't manage to fix it, so the issue remains.