parallel shared library generation
Instead of testing form rank==0 possibly use different dune-py for each process and have all processes generated the library. Could be done by not using in source builds but building in dune-py/build-procNo instead. This avoids problems with loading a single shared library over multiple processors