I can write a bit about this whole issue. Sorry for the long text, I try to keep it structured ;)
Some history
When it became clear that Steffen was going to leave our group Steffen tore down our servers with the goal of setting everything up in a simpler way. The tearing down part was finished, but there were some holes in the setting up part.
Beginning of February he handed over most of the server infrastructure. We spend a whole day for that but we didn't reach the topic of the CI images.
Afterwards it was hard to reach Steffen since his new work took a lot more time than expected.
Last weak I heard back from him and I got the info that he won't be able to do anything regarding that before Easter.
Current state is
We used to build docker images within docker images. Fortunately I was never involved, but I heard that it was no fun ;). In the new setup we run a container with podman and use buildah to build the CI container within that container.
When we took over the server infrastructure from Steffen it was already in a kind of broken/unstable state. At first we didn't investigate since we hoped to reach Steffen and since it is much harder to fix something that someone else wrote, that is broken and where you do not even know how it is supposed to work.
Since last weak I finally have access to the server where the CI job is running. I actually looked at the problem this Friday but I was not (yet?) able to solve the issue. The problem happens in the line dnf --setopt=install_weak_deps=False install -y git ... of the dockerfile. If I run the dockerfile with podman by hand it works fine.
I will try again to fix this issue next week but I can't tell if this will succeed or not.
We could wait for Steffen to fix it (maybe around mid of April)
Someone else wants to step in. This could mean building the CI images somewhere else either with the podman/buildah setup or the old docker in docker way. I could provide Steffens podman gitlab-runner. Another option might be to give someone access to our server to have a look.
So far from my side. Sorry for the whole inconvenience but I guess I'm the one for whom it is most incovenient ;)
I tried a few things (without much understanding) - the aim was to at least get the CI on the docker/ci repo to pass. What I did:
Buildah
I have no idea what the issue might be. I commented out the building of the buildah image in the .gitlab yml file to at least get the other images to build using the existing buildah image. Since an image is in the registry we can use that for now.
Debian 11
After removing the rebuild of buildah there is a second issue in the debian-11 image
E: Package 'python-wheel' has no installation candidate
I build the image locally removing the apt install python-wheel and ran a container with the image, pip install wheel works fine but pip install python-wheel doesn't.
Perhaps someone has an idea what the issue with the debian image is (debian 10 also installs python-wheel and works)?
Weirdly enough the existing Debian:11 image has python2's wheel package installed - at least pythoon and import wheel works but doesn't on the debian 11 image I build locally.
All of this is perhaps not really a problem since we dropped python 2 support and python3-wheel can be installed without issues.
Question
All of this happens in a branch feature/dunePythonImage (because getting CI to work for dune-python was what I was actually working).
I just realized that the images I'm building are not tagged with the branch name so I'm replacing the original image, is that right?
Since a debian 11 image with python-wheel exists I didn't check in the Dockerfile without the python-wheel since that would overwrite the existing one (I thin).
Other images not working
dune:git-debian-9-gcc-6-14: issue with optional when building dune-common - should be remove
dune:git-debian-10-gcc-7-14: same issue as above but this one should work
If one of those should be stuck with c++14 then let me know.
Summary
My suggestion would be to comment out the rebuild of the Buildah and debian 11 image for now and see if Steffen has some time after Easter to have a look. If not we can decide on some other way forward. If people are happy with that I'll merge my branch.
I have made progress in fixing buildah. Couldn't find the error in the original Dockerfile from Steffen, but simply used the official buildah Dockerfile. This works fine so far. And it seems to build the other images as well. This new buildah docker image is currently put into a separate branch, called service. It just builds the docker and buildah images. If this works all fine, it could be merged back to master and then rebuild every time we build the whole stack