Skip to content
Snippets Groups Projects

Feature/python reduce printing compiling

Merged Samuel Burbulla requested to merge feature/python-reduce-printing-compiling into master
3 unresolved threads

Recent changes lead to a printing of "Compiling xyz...." every time a JIT python module is loaded - even if it's not actually compiled. It's not possible to identify if sth. actually has to be compiled, e.g. because headers could have changed.

I added a handler to print this info only if the compile command takes more than one second, therefore loading a module should not be considered any more as compiling.

Edited by Samuel Burbulla

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • assigned to @tkoch

  • Samuel Burbulla requested review from @tkoch

    requested review from @tkoch

  • assigned to @andreas.dedner and unassigned @tkoch

  • Samuel Burbulla marked this merge request as ready

    marked this merge request as ready

  • In which commit did this behavior change, I missed that. (found it)

    @samuel.burbulla do you know if this solution is neutral wrt runtime? I'm not sure how this signal package works.

    Edited by Andreas Dedner
  • This is easily revertible so I'll go ahead and merge.

  • Andreas Dedner mentioned in commit f19b93f6

    mentioned in commit f19b93f6

    • @samuel.burbulla I believe the output was correct. It is actually compiling. The compiler is just fast because none of the sources changed. But the compile command is executed. This is actually quite slow as I added in a comment above. However, I think this is a safety measure because it's possible that the dependencies changed when it reached that part of the code. In many cases recompiling is not necessary but I don't know how to optimize this.

    • But the current solution looks cool.

    • Yeah, it's true that it is calling make on the target, but we just do this because we don't know if we have to recompile. It's the standard behavior even for just loading an existing module. I suppose we don't actually want to call this "Compiling".

      It would definitely be better not to call make every time when loading a module, but I see no way of finding out if any dependency has changed. There is a -q flag for make that returns if a target has to be recompiled, but it has some troubles with CMake (one has to figure out the specific target) and I suppose it won't be much faster.

      Another completely different approach would be not to try to recompile any existing module by default, just load it. This would be much faster than the current approach. In this case, we have to add some possibility to recompile modules if one needs to, e.g. when changing headers. This could happen either on demand (What about python -m dune recompile-generated which would call make in dune-py, possibly parallel?) or it is done like now by setting some environment flag. Of course, any developer has to be aware of this changed behavior.

      @andreas.dedner I don't really know, but I think it is fully neglectable in comparison to calling cmake.

    • This is difficult and should be thought about with care. There are at least three different reasons to have that extra make at that point

      1. dune-py has changed
      2. the headers the file depends on have changed
      3. the generated .cc file has changed

      Remark 1:

      I would not want to remove this make and simply replace it with something the user has to call. At the moment installing/upgrading a pypi package leads to a reconfigure of dune-py but the already existing source files indune-py/python/generated are not removed. Requiring the user to remember to always actively remake all existing targets is not feasible in general.

      We would have to remove all targets on install/upgrade - a possible proceedure.

      Remark 2:

      I've been considering a make -ijN call in dune-py/python/dune/generated when a script is executed. Advantage would be that a change to the sources, e.g., fvector.hh would be handled with a parallel make. Disadvantage would be that all modules would be recompiled although I might be only testing a small change for which I need to only load a single module. My current dune-py contains 300 jit compiled modules so parallel make wouldn't be of much help if they all depend on fvector.hh. Also: some modules might not compile anymore due to some change which is not yet completely taken care of, i.e., still needing some change in dune-foo. Then the make could fail (would be taken care of with the -i).

      Note: adding a python -m dune recompile which one can use to get the parallel make would be helpful for developers who know what they are doing.

      Remark 3:

      The third point given at the beginning is the one that I'm the least sure about. One makes a change in a python script generating some .cc file in dune-py but without changing the hash key, i.e., the file name. How would that be picked up? Simply recompiling at the beginning of the script would not be sufficient.

    • It could be made optional the other way around as an optimization option. I.e. per default we of course have to compile every time for safety because there are too many cases that might need recompilation. But if you know that you have a stable dune-py module and it won't need recompilation you could manually disable recompilation to get a speed-up either through an environmental variable are by setting a flag programmatically in Python.

      Edited by Timo Koch
    • That is of course an option - I don't know how much difference it will make anyway. If someone is loading python modules within a critical part of the code then they are probably doing something wrong anyway. At the top of the load method we have

      module = sys.modules.get("dune.generated." + moduleName)
      if module is None:

      so if a module was already loaded nothing happens anyway. One would need to be loading new modules all the time to really notice a difference I would have thought.

    • Please register or sign in to reply
  • I do not fully understand this thread. But have you considered that you can use Dune without having a make present? ninja or msbuild would do the trick. Ninja is even very fast figuring out that nothing has to be build.

    • @gruenich if Ninja can solve this problem that would be great - we can install ninja through pypi into the venv so it is not a too bad dependency to have (we use it for the packaging anyway). We would have to somehow figure out a way of forcing the use of ninja for building dune-py independent of what the user has set for the other dune modules.

      The problem is the following (hopefully I can explain it clearly enough):

      we would like to provide the user with a message before starting the compilation of some module but only if the call to cmake --build foo.so is actually going to do something. As pointed out in one of the previous replies !1052 (comment 86432) there are a few cases which require a call to cmake --build but often enough the repository is clean and nothing will be done. In that case we would like to avoid a message. We couldn't find a way to do a dry run with cmake. The best I could think of is something like

      ret = execute(cmake --build . --target foo.so -- -n)
      if "Building" in ret: # possibly there is a better way of parsing the output
          log.info("compiling foo.so")
          execute(cmake --build . --target foo.so)

      which would work with make and ninja I believe.

      The solution that @samuel.burbulla implemented was to print a message if the call to execute(cmake --build . --target foo.so) takes more than a second using single handling. It works but seems a bit like using canons to shoot pigeons.

    • I do not have a solution. But I know that you must not use ˋ—-ˋ, because everything behind is passed verbatim to the underlying build tool (make, ninja, msbuild, etc.). You cannot make this work for all build tools, as they support different arguments.

    • In this case we can control the backend used if we want to that was my point. We generate dune-py from scratch and can even install ninja into the venv so we could just force its use in the build of dune-py.

    • @andreas.dedner I think your general approach is good ... checking the output, but

      1. why do you have to do a dry run first?
      2. is "Building" really the correct check? I.e. are you sure that this output is generated on every platform?
      3. currently you buffer the output with

      Instead of a dry run you could simply parse the output directly. You use subprocess.Popen to execute the build command and the subprocess already runs as a separate process. Thus the output can be parse synchronously... here an example

      import subprocess
      cmd = subprocess.Popen('echo Hello; sleep 5; echo World!',
                             shell=True, text=True, stdout=subprocess.PIPE)
      for line in cmd.stdout:
          print(line)

      The command will print

      Hello
      World

      and as you will observe running the code, there is a delay between "Hello" and "World".

      As possible solution might thus be to call the build command as follows:

              with subprocess.Popen(cmake_args,
                                    cwd=self.generated_dir,
                                    text=True,
                                    stdout=subprocess.PIPE,
                                    stderr=subprocess.PIPE) as cmake:
                  compiler_output = b""
                  for line in cmake.stdout:
                      if "Building" in line: logger.info("Compiling "+pythonName)
                      compiler_output = compiler_output + line
                  # wait for cmd to finish
                  cmake.wait()
                  # check return code
                  if cmake.returncode > 0:
                      compiler_error = b""
                      for line in cmake.stderr:
                          compiler_error = compiler_error + line
                      raise CompileError(compiler_error)
                  elif verbose:
                      logger.debug("Compiler output: "+buffer_to_str(compiler_output))

      (untested!)

    • PS: I can later prepare an MR.

    • I missed that we can use that the subprocess command returns before the command has completed.

      is "Building" really the correct check? I.e. are you sure that this output is generated on every platform?

      No, I don't know for sure that this is right check to make in every case. I just tested it locally and that was what came out. It was just to demonstrate the idea.

      I just went through the documentation for Popen and a different approach which is similar but simpler then @samuel.burbulla approach might be

      try:
          cmake.communicate(timeout=2) # no message if delay is <2sec
      except TimeoutExpired:
          print("Compiling")
          cmake.communicate()
    • try:
          cmake.communicate(timeout=2) # no message if delay is <2sec
      except TimeoutExpired:
          print("Compiling")
          cmake.communicate()

      doesn't the timeout kill the subprocess? I would expect this.

    • From the documentation

      https://docs.python.org/3/library/subprocess.html#subprocess.Popen.communicate

      If the process does not terminate after timeout seconds, a TimeoutExpired exception will be raised. Catching this exception and retrying communication will not lose any output.

    • Based on @christi test here is the alternative:

      tout = 5 # change to 1 to get 'non timeout behaviour'
      cmd = subprocess.Popen('echo Hello; sleep {0}; echo World!'.format(tout),
                             shell=True, text=True, stdout=subprocess.PIPE)
      try:
          cmd.communicate(timeout=2)
          print("short command")
          stdout, stderr = cmd.communicate()
          print(stdout)
      except subprocess.TimeoutExpired:
          print("long command")
          ## this does not print the first part of the output (i.e. Hello)
          # for line in cmd.stdout:
          #     print(line)
          ## this way the full output is available (as stated in the documentation) 
          stdout, stderr = cmd.communicate()
          print(stdout)
    • Please register or sign in to reply
    • A bit unrelated but @christi remark made me realize that we could change the debug output of the cmake call

              cmake = subprocess.Popen(cmake_args, cwd=self.generated_dir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
              stdout, stderr = cmake.communicate()
              if cmake.returncode > 0:
                  raise CompileError(buffer_to_str(stderr))
              elif verbose:
                  logger.debug("Compiler output: "+buffer_to_str(stdout))

      to be on the fly instead of always having to wait for the command to terminate (that has annoyed me at times).

    • I think if you want live output something like this should work. Otherwise it can still buffer forever before you get output.

      with subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) as process:
          if verbose:
            logger.debug("Compiler output:")
            for c in iter(process.stdout.readline, b''):
                logger.debug(buffer_to_str(c))
    •         cmake = subprocess.Popen(cmake_args, cwd=self.generated_dir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
              stdout, stderr = cmake.communicate()
              if cmake.returncode > 0:
                  raise CompileError(buffer_to_str(stderr))
              elif verbose:
                  logger.debug("Compiler output: "+buffer_to_str(stdout))

      to be on the fly instead of always having to wait for the command to terminate (that has annoyed me at times).

      this will not work, as stdout, stderr = cmake.communicate() copies the output and the continues. You have to directly operate on cmake.stderr or cmake.stdout.

    • There is possibly a missunderstanding here - the code I listed was a snippet from the current implementation which I suggested could be adapted.

    • :-) OK.

      I'll prepare something...

    • See !1066 (merged) for a possible implementation

    • Please register or sign in to reply
  • Christian Engwer mentioned in merge request !1066 (merged)

    mentioned in merge request !1066 (merged)

Please register or sign in to reply
Loading