Feature/python reduce printing compiling
Recent changes lead to a printing of "Compiling xyz...." every time a JIT python module is loaded - even if it's not actually compiled. It's not possible to identify if sth. actually has to be compiled, e.g. because headers could have changed.
I added a handler to print this info only if the compile command takes more than one second, therefore loading a module should not be considered any more as compiling.
Merge request reports
Activity
assigned to @tkoch
requested review from @tkoch
assigned to @andreas.dedner and unassigned @tkoch
In which commit did this behavior change, I missed that.(found it)@samuel.burbulla do you know if this solution is neutral wrt runtime? I'm not sure how this
signal
package works.Edited by Andreas Dednermentioned in commit f19b93f6
@samuel.burbulla I believe the output was correct. It is actually compiling. The compiler is just fast because none of the sources changed. But the compile command is executed. This is actually quite slow as I added in a comment above. However, I think this is a safety measure because it's possible that the dependencies changed when it reached that part of the code. In many cases recompiling is not necessary but I don't know how to optimize this.
Yeah, it's true that it is calling
make
on the target, but we just do this because we don't know if we have to recompile. It's the standard behavior even for just loading an existing module. I suppose we don't actually want to call this "Compiling".It would definitely be better not to call make every time when loading a module, but I see no way of finding out if any dependency has changed. There is a
-q
flag for make that returns if a target has to be recompiled, but it has some troubles with CMake (one has to figure out the specific target) and I suppose it won't be much faster.Another completely different approach would be not to try to recompile any existing module by default, just load it. This would be much faster than the current approach. In this case, we have to add some possibility to recompile modules if one needs to, e.g. when changing headers. This could happen either on demand (What about
python -m dune recompile-generated
which would call make in dune-py, possibly parallel?) or it is done like now by setting some environment flag. Of course, any developer has to be aware of this changed behavior.@andreas.dedner I don't really know, but I think it is fully neglectable in comparison to calling
cmake
.This is difficult and should be thought about with care. There are at least three different reasons to have that extra
make
at that point-
dune-py
has changed - the headers the file depends on have changed
- the generated
.cc
file has changed
Remark 1:
I would not want to remove this
make
and simply replace it with something the user has to call. At the moment installing/upgrading a pypi package leads to a reconfigure ofdune-py
but the already existing source files indune-py/python/generated
are not removed. Requiring the user to remember to always activelyremake
all existing targets is not feasible in general.We would have to remove all targets on
install/upgrade
- a possible proceedure.Remark 2:
I've been considering a
make -ijN
call indune-py/python/dune/generated
when a script is executed. Advantage would be that a change to the sources, e.g.,fvector.hh
would be handled with a parallel make. Disadvantage would be that all modules would be recompiled although I might be only testing a small change for which I need to only load a single module. My currentdune-py
contains 300 jit compiled modules so parallel make wouldn't be of much help if they all depend onfvector.hh
. Also: some modules might not compile anymore due to some change which is not yet completely taken care of, i.e., still needing some change indune-foo
. Then the make could fail (would be taken care of with the-i
).Note: adding a
python -m dune recompile
which one can use to get the parallel make would be helpful for developers who know what they are doing.Remark 3:
The third point given at the beginning is the one that I'm the least sure about. One makes a change in a python script generating some
.cc
file indune-py
but without changing the hash key, i.e., the file name. How would that be picked up? Simply recompiling at the beginning of the script would not be sufficient.-
It could be made optional the other way around as an optimization option. I.e. per default we of course have to compile every time for safety because there are too many cases that might need recompilation. But if you know that you have a stable dune-py module and it won't need recompilation you could manually disable recompilation to get a speed-up either through an environmental variable are by setting a flag programmatically in Python.
Edited by Timo KochThat is of course an option - I don't know how much difference it will make anyway. If someone is loading python modules within a critical part of the code then they are probably doing something wrong anyway. At the top of the
load
method we havemodule = sys.modules.get("dune.generated." + moduleName) if module is None:
so if a module was already loaded nothing happens anyway. One would need to be loading new modules all the time to really notice a difference I would have thought.
@gruenich if Ninja can solve this problem that would be great - we can install ninja through pypi into the venv so it is not a too bad dependency to have (we use it for the packaging anyway). We would have to somehow figure out a way of forcing the use of ninja for building
dune-py
independent of what the user has set for the other dune modules.The problem is the following (hopefully I can explain it clearly enough):
we would like to provide the user with a message before starting the compilation of some module but only if the call to
cmake --build foo.so
is actually going to do something. As pointed out in one of the previous replies !1052 (comment 86432) there are a few cases which require a call tocmake --build
but often enough the repository is clean and nothing will be done. In that case we would like to avoid a message. We couldn't find a way to do adry run
with cmake. The best I could think of is something likeret = execute(cmake --build . --target foo.so -- -n) if "Building" in ret: # possibly there is a better way of parsing the output log.info("compiling foo.so") execute(cmake --build . --target foo.so)
which would work with
make
andninja
I believe.The solution that @samuel.burbulla implemented was to print a message if the call to
execute(cmake --build . --target foo.so)
takes more than a second using single handling. It works but seems a bit like using canons to shoot pigeons.@andreas.dedner I think your general approach is good ... checking the output, but
- why do you have to do a dry run first?
- is "Building" really the correct check? I.e. are you sure that this output is generated on every platform?
- currently you buffer the output with
Instead of a dry run you could simply parse the output directly. You use
subprocess.Popen
to execute the build command and the subprocess already runs as a separate process. Thus the output can be parse synchronously... here an exampleimport subprocess cmd = subprocess.Popen('echo Hello; sleep 5; echo World!', shell=True, text=True, stdout=subprocess.PIPE) for line in cmd.stdout: print(line)
The command will print
Hello World
and as you will observe running the code, there is a delay between "Hello" and "World".
As possible solution might thus be to call the build command as follows:
with subprocess.Popen(cmake_args, cwd=self.generated_dir, text=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) as cmake: compiler_output = b"" for line in cmake.stdout: if "Building" in line: logger.info("Compiling "+pythonName) compiler_output = compiler_output + line # wait for cmd to finish cmake.wait() # check return code if cmake.returncode > 0: compiler_error = b"" for line in cmake.stderr: compiler_error = compiler_error + line raise CompileError(compiler_error) elif verbose: logger.debug("Compiler output: "+buffer_to_str(compiler_output))
(untested!)
I missed that we can use that the subprocess command returns before the command has completed.
is "Building" really the correct check? I.e. are you sure that this output is generated on every platform?
No, I don't know for sure that this is right check to make in every case. I just tested it locally and that was what came out. It was just to demonstrate the idea.
I just went through the documentation for
Popen
and a different approach which is similar but simpler then @samuel.burbulla approach might betry: cmake.communicate(timeout=2) # no message if delay is <2sec except TimeoutExpired: print("Compiling") cmake.communicate()
From the documentation
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate
If the process does not terminate after timeout seconds, a TimeoutExpired exception will be raised. Catching this exception and retrying communication will not lose any output.
Based on @christi test here is the alternative:
tout = 5 # change to 1 to get 'non timeout behaviour' cmd = subprocess.Popen('echo Hello; sleep {0}; echo World!'.format(tout), shell=True, text=True, stdout=subprocess.PIPE) try: cmd.communicate(timeout=2) print("short command") stdout, stderr = cmd.communicate() print(stdout) except subprocess.TimeoutExpired: print("long command") ## this does not print the first part of the output (i.e. Hello) # for line in cmd.stdout: # print(line) ## this way the full output is available (as stated in the documentation) stdout, stderr = cmd.communicate() print(stdout)
A bit unrelated but @christi remark made me realize that we could change the debug output of the
cmake
callcmake = subprocess.Popen(cmake_args, cwd=self.generated_dir, stdout=subprocess.PIPE, stderr=subprocess.PIPE) stdout, stderr = cmake.communicate() if cmake.returncode > 0: raise CompileError(buffer_to_str(stderr)) elif verbose: logger.debug("Compiler output: "+buffer_to_str(stdout))
to be on the fly instead of always having to wait for the command to terminate (that has annoyed me at times).
I think if you want live output something like this should work. Otherwise it can still buffer forever before you get output.
with subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) as process: if verbose: logger.debug("Compiler output:") for c in iter(process.stdout.readline, b''): logger.debug(buffer_to_str(c))
cmake = subprocess.Popen(cmake_args, cwd=self.generated_dir, stdout=subprocess.PIPE, stderr=subprocess.PIPE) stdout, stderr = cmake.communicate() if cmake.returncode > 0: raise CompileError(buffer_to_str(stderr)) elif verbose: logger.debug("Compiler output: "+buffer_to_str(stdout))
to be on the fly instead of always having to wait for the command to terminate (that has annoyed me at times).
this will not work, as
stdout, stderr = cmake.communicate()
copies the output and the continues. You have to directly operate oncmake.stderr
orcmake.stdout
.See !1066 (merged) for a possible implementation
mentioned in merge request !1066 (merged)