Overhaul of the Python-CMake integration

TL;DR: This issue a summarizes the status quo of our Python/CMake integration and proposes some changes. You might want to have a look at the High Level Goals part of the proposal below to get an idea.

With the recent discussion of enabling the Python bindings by default, our CMake extension for Python code has become more attention. So far, this extension accommodated the needs of several user groups, but did not do this in a unified, coordinated way. With the prospect of default-enabled Python bindings, additional requirements have arisen that require a bit of reengineering of the extension. This issue summarizes all technical aspects and provides a proposal.

Prerequisistes: Defining Installation

This document will talk a lot about installation. It is worth to make some fine distinctions here:

  • By Stack Installation we refer to a method of installing an entire Dune stack including getting all the core sources. An example of this would be Debian packages or Spack repository. We do not cover this type here, because it is not really affected. In the Python context it is important to note that pip-installable Dune packages are also in this category and are therefore not subject of this document, although it seems clearly Python-related.
  • By Global Installation we refer to the installation method that we know from make install in C++: A set of sources and build artifacts is copied into a pre-defined directory structure under a standardized system path prefix.
  • By Local Installation we refer to the same method as Global installation only that the user-given prefix that is not a system path.
  • By Package Installation we refer to the process that is necessary to execute code in a Python environment without manual adjustments to path variables. Although global/local and Package installation of Python code use the same commands (e.g. python -m pip install) this still must be distinguished, as it is also part of the build process.

Prerequisites: Workflow Stages

It is also worth distinguishing the typical stages in a user/developer workflow.

  • CMake stage: the project is configured.
  • Build stage: Compiling C++ sources, potentially also building tests
  • Test Stage: Running tests on the build produced by the build stage
  • Install stage: Perform global/local installation

Status Quo:

Currently, the following Python-related things happen at these stages. You can skip over this part if it is too technical for you, but I feel like there is a big need for a write-up of this.

  • CMake Stage
    • Python is searched on the System. It finds the currently active Python, which might be the System interpreter or a virtualenv that is activated at the time CMake runs
    • We determine whether the found Python is in a virtualenv
    • We search for the Python package pip
    • If the user enabled DUNE_PYTHON_VIRTUALENV_SETUP, a virtualenv is created. This uses either the virtualenv or the venv package. If the build directory tree uses an absolute prefix, this env is located in build-prefix/dune-python-env, otherwise it is placed in the build directory of dune-common. If dune-common is globally installed, the Dune module dependency DAG is traversed to find a build directory - this procedure potentially breaks if there is a diamond dependency pattern with some modules of the diamond being globally installed. The created virtualenv is only exposed through the CMake variable DUNE_PYTHON_VIRTUALENV_EXECUTABLE, the Python found by CMake is still accessible. An activate script and run-in-dune-env.sh wrapper are placed in each build directory to give users access to the created environment.
    • If the virtualenv was set up, Python packages that are provided by this Dune module are installed into the virtualenv.
    • dune-testtools and modules that depend on it continue to execute Python code during the CMake run.
  • Build Stage
    • Python Modules that are added with dune_add_pybind11_module are built.
    • In dune-codegen, some C++ targets depend on a header that is generated running Python code in the virtualenv. This is managed by the CMake command add_custom_command.
    • dune-fufem targets link against the Python libraries to start an embedded interpreter.
  • Test Stage
    • As part of ctest, Python tests that were added through the dune_python_add_test command are executed.
    • If you only want to run the Python tests, you can also do make test_python.
    • When code using the Python bindings is first executed (likely in the test stage), it sets up a Dune module called dune-py that serves as the playground for code generation and Just-In-Time compilation. This module is currently not placed in an isolated per-build location leading to potential compatibility issues between builds.
  • Install Stage
    • A global/local installation of Python packages is performed via pip. In contrast to C++, the prefix choice depends on the CMake Variable DUNE_PYTHON_INSTALL_LOCATION=user|system|none, where
      • none disables installation of Python packages
      • system uses the interpreter found by CMake
      • user uses pip install --user, which installs into ~/.local If the variable is not set, it defaults to system if the found interpreter runs in a virtualenv and none otherwise. The user option is incompatible with virtual environments.
    • A wheel (a standarized binary distribution format for Python packages) is built for each package and installed into a location under the Dune installation prefix (called a wheelhouse). This is necessary to allow partially installed Dune stacks: If my module depends on dune-common - where do I get the Python package dune.common from? Answer: from the Dune wheelhouse under the prefix that was used to install dune-common.
    • Dependencies of Python packages are automatically installed by pip using the Python Package Index: This requires network access. Failing to provide it currently results in a CMake Error (pip network timeout).
    • The Python-specific part of the installation can be run separately by using the dedicated make install_python. This is what is advocated and used for the Python bindings - although you need to do this before you can run the test stage.
    • All installation rules refer to the Python environment that was active when CMake ran - which might be different from the Python environment at the time of installation.

If you want to read code for this, it is located in dune-common/cmake/modules/DunePython*.cmake. The main entry point is DunePythonCommonMacros.cmake.

Proposal

I am willing to implement this or a similar solution.

High Level Goals

  • With Dune being and staying a C++ project, the above workflow stages and their order should stay the same.
  • At the build stage, all Python code is package-installed into an isolated per-build environment. (This was a fundamental outcome of the meeting: In the C++ frame of mind, Python Package Installation is part of building, not installing).
  • Building a Dune stack without network access should be successful, though Python parts can be disabled in this build.
  • Global/local installation of Python should install all Python packages into the Python environment that was active when CMake ran
  • Global/local installation of Python should depend on the Python environment at the time of installation (I really wanted this to work, but thinking about it more I realized that the embedded interpreter case struggles heavily with this)
  • Performing a global/local installation of the entire Dune stack (e.g. dunecontrol make install_python) should make the build environment disposable.
  • Running Python code should be independent of Dune-specific environment variables. (I added this one after some experiments)

Technical Details

I think, we already have almost all of the necessary components, they just need to be slightly tweaked.

These are only some core ideas, there is a lot of detail to this that can be discussed once I provided a sample implementation:

  • The virtualenv set up is enabled by default
  • The following conditions will lead to all Python aspects being disabled
    • Python is not found
    • Pip or virtualenv/venv are not found
    • Pip operations produce a network timeout (this can still be worked around by setting up PyPI mirrors, but this should never be part of a standard user workflow)
  • After virtualenv set up, the CMake check for Python is performed again, so that CMake only knows the virtual environment.
  • Python Package installations into the virtualenv are formulated as custom targets that are part of all iff the bindings are enabled.
  • Package installation is executed in CMake Script Mode (cmake -P) to allow searching and finding the current Python environment in the installation stage.
  • The embedded interpreter use case should use CMake generator expressions to link to the system Python after local/global installation.

The only problematic use case that I can think of right now is the dune-testtools one - but that mostly affects myself. I will find a way to deal with it.

Edited by Dominic Kempf