I also do have this issue (I am using ALUGrid instead of UGGrid). First of all thanks to all of your replies to my e-mail for this topic. The suggestion here was to use the loadBalance() method not for the multidomaingrid, but for the hostgrid. This is working to distribute the grid over all processors.
Now I wanted to also iterate over the interface between the different domains with standard iterator for this in dune-multidomaingrid. However it seems that the iterator only finds the intersections, for which inside and outside cell are on the same processor. I have attached a minimal exampledune-minimalExample.cc together with an mesh file twoDomainMesh.msh.
If the example is executed with 1 or 2 processors this case does not happen, due to the splitting of the mesh. If it is executed with 7 processors more then 70% of the intersections are not visited.
Here are my ideas to overcome this issue:
Ensure that the partitioning of the mesh does not send neighboring cell from different domains to different processes. Here is the question how to do this?
Would it help, if the parallel grid would use ghost cells? In the example no ghost cells are used.
Are there any other methods to overcome this issue?
I have solved the problem for me in the following way:
The situation: I am using an unstructured mesh as host mesh for multidomaingrid. The problem is as described above.
My current workaround: For UGGrid and ALUGrid it is possible to use a user defined partition (For UGGrid there is a version of the loadBalance() method (See dune-book p. 204) and for ALUGrid there is version of repartion (See ALUGrid paper, Section 3.8)).
I computed an initial partition for my mesh. Then checked if there are any intersections between two domains, that are also an intersection between two processes. In case there is one of these intersections I move one of the elements of the intersection to an other process.
Now I use this partition to create distribute the mesh.
Note:
The partition is completely computed in sequential. This can be time consuming
Finding a good initial partition depends on the geometry. One can also use external partitioners such as ParMetis.
Automatic load balancing (e.g. in case of adaptive meshes) is maybe not straight forward.
Depending on the number of elements that are moved after the initial partition was created, the communication overhead can increase.
Hey Paul, thanks a lot for both the detailed report and the follow up partial solution for your case, very much appreciated, seriously! I must accept that I don't know much about the part of the code that is supposed to do load balancing (yes, there is such part), but is hard to read in between lines of the meta-programming and I do not use it myself because I am not doing a lot of MPI parallelism at the moment. I was hopping to clean up a little bit the templated code with newer constexpr functions and hope to understand a bit more what's going on in the mean time, but I have had no much time lately.
As you mentioned, the alternative that you mentioned has several limitations. If you hit a hard wall and dig a bit more in this issue, I can at minimum offer to review your alternative. So feel free to ping me if you need this :)
@paul.maidl if you are interested in learning the code then writing some modernization patches is a good way to do that. The dune-multidomaingrid code is fairly old. Very likely it can be made simpler by newer C++ features like range-based for, if constexpr and the like.