Overhaul of the Python-CMake integration
TL;DR: This issue a summarizes the status quo of our Python/CMake integration and proposes some changes. You might want to have a look at the High Level Goals part of the proposal below to get an idea.
With the recent discussion of enabling the Python bindings by default, our CMake extension for Python code has become more attention. So far, this extension accommodated the needs of several user groups, but did not do this in a unified, coordinated way. With the prospect of default-enabled Python bindings, additional requirements have arisen that require a bit of reengineering of the extension. This issue summarizes all technical aspects and provides a proposal.
Prerequisistes: Defining Installation
This document will talk a lot about installation. It is worth to make some fine distinctions here:
- By Stack Installation we refer to a method of installing an entire Dune stack including getting all the core sources. An example of this would be Debian packages or Spack repository. We do not cover this type here, because it is not really affected. In the Python context it is important to note that
pip
-installable Dune packages are also in this category and are therefore not subject of this document, although it seems clearly Python-related. - By Global Installation we refer to the installation method that we know from
make install
in C++: A set of sources and build artifacts is copied into a pre-defined directory structure under a standardized system path prefix. - By Local Installation we refer to the same method as Global installation only that the user-given prefix that is not a system path.
- By Package Installation we refer to the process that is necessary to execute code in a Python environment without manual adjustments to path variables. Although global/local and Package installation of Python code use the same commands (e.g.
python -m pip install
) this still must be distinguished, as it is also part of the build process.
Prerequisites: Workflow Stages
It is also worth distinguishing the typical stages in a user/developer workflow.
- CMake stage: the project is configured.
- Build stage: Compiling C++ sources, potentially also building tests
- Test Stage: Running tests on the build produced by the build stage
- Install stage: Perform global/local installation
Status Quo:
Currently, the following Python-related things happen at these stages. You can skip over this part if it is too technical for you, but I feel like there is a big need for a write-up of this.
- CMake Stage
- Python is searched on the System. It finds the currently active Python, which might be the System interpreter or a virtualenv that is activated at the time CMake runs
- We determine whether the found Python is in a virtualenv
- We search for the Python package
pip
- If the user enabled
DUNE_PYTHON_VIRTUALENV_SETUP
, a virtualenv is created. This uses either thevirtualenv
or thevenv
package. If the build directory tree uses an absolute prefix, this env is located inbuild-prefix/dune-python-env
, otherwise it is placed in the build directory ofdune-common
. Ifdune-common
is globally installed, the Dune module dependency DAG is traversed to find a build directory - this procedure potentially breaks if there is a diamond dependency pattern with some modules of the diamond being globally installed. The created virtualenv is only exposed through the CMake variableDUNE_PYTHON_VIRTUALENV_EXECUTABLE
, the Python found by CMake is still accessible. Anactivate
script andrun-in-dune-env.sh
wrapper are placed in each build directory to give users access to the created environment. - If the virtualenv was set up, Python packages that are provided by this Dune module are installed into the virtualenv.
- dune-testtools and modules that depend on it continue to execute Python code during the CMake run.
- Build Stage
- Python Modules that are added with
dune_add_pybind11_module
are built. - In dune-codegen, some C++ targets depend on a header that is generated running
Python code in the virtualenv. This is managed by the CMake command
add_custom_command
. - dune-fufem targets link against the Python libraries to start an embedded interpreter.
- Python Modules that are added with
- Test Stage
- As part of
ctest
, Python tests that were added through thedune_python_add_test
command are executed. - If you only want to run the Python tests, you can also do
make test_python
. - When code using the Python bindings is first executed (likely in the test stage),
it sets up a Dune module called
dune-py
that serves as the playground for code generation and Just-In-Time compilation. This module is currently not placed in an isolated per-build location leading to potential compatibility issues between builds.
- As part of
- Install Stage
- A global/local installation of Python packages is performed via
pip
. In contrast to C++, the prefix choice depends on the CMake VariableDUNE_PYTHON_INSTALL_LOCATION=user|system|none
, where-
none
disables installation of Python packages -
system
uses the interpreter found by CMake -
user
usespip install --user
, which installs into~/.local
If the variable is not set, it defaults tosystem
if the found interpreter runs in a virtualenv andnone
otherwise. Theuser
option is incompatible with virtual environments.
-
- A wheel (a standarized binary distribution format for Python packages) is built
for each package and installed into a location under the Dune installation prefix (called a wheelhouse).
This is necessary to allow partially installed Dune stacks: If my module depends
on dune-common - where do I get the Python package
dune.common
from? Answer: from the Dune wheelhouse under the prefix that was used to install dune-common. - Dependencies of Python packages are automatically installed by pip using the Python Package Index: This requires network access. Failing to provide it currently results in a CMake Error (pip network timeout).
- The Python-specific part of the installation can be run separately by using
the dedicated
make install_python
. This is what is advocated and used for the Python bindings - although you need to do this before you can run the test stage. - All installation rules refer to the Python environment that was active when CMake ran - which might be different from the Python environment at the time of installation.
- A global/local installation of Python packages is performed via
If you want to read code for this, it is located in dune-common/cmake/modules/DunePython*.cmake
.
The main entry point is DunePythonCommonMacros.cmake
.
Proposal
I am willing to implement this or a similar solution.
High Level Goals
- With Dune being and staying a C++ project, the above workflow stages and their order should stay the same.
- At the build stage, all Python code is package-installed into an isolated per-build environment. (This was a fundamental outcome of the meeting: In the C++ frame of mind, Python Package Installation is part of building, not installing).
- Building a Dune stack without network access should be successful, though Python parts can be disabled in this build.
- Global/local installation of Python should install all Python packages into the Python environment that was active when CMake ran
-
Global/local installation of Python should depend on the Python environment at the time of installation(I really wanted this to work, but thinking about it more I realized that the embedded interpreter case struggles heavily with this) - Performing a global/local installation of the entire Dune stack (e.g.
dunecontrol make install_python
) should make the build environment disposable. - Running Python code should be independent of Dune-specific environment variables. (I added this one after some experiments)
Technical Details
I think, we already have almost all of the necessary components, they just need to be slightly tweaked.
These are only some core ideas, there is a lot of detail to this that can be discussed once I provided a sample implementation:
- The virtualenv set up is enabled by default
- The following conditions will lead to all Python aspects being disabled
- Python is not found
- Pip or virtualenv/venv are not found
- Pip operations produce a network timeout (this can still be worked around by setting up PyPI mirrors, but this should never be part of a standard user workflow)
- After virtualenv set up, the CMake check for Python is performed again, so that CMake only knows the virtual environment.
- Python Package installations into the virtualenv are formulated as custom targets
that are part of
all
iff the bindings are enabled. Package installation is executed in CMake Script Mode (cmake -P
) to allow searching and finding the current Python environment in the installation stage.- The embedded interpreter use case should use CMake generator expressions to link to the system Python after local/global installation.
The only problematic use case that I can think of right now is the dune-testtools one - but that mostly affects myself. I will find a way to deal with it.