currently (which is dune 2.1 in my case) dunecontrol writes the file dune.resume to the user's home directory on every run and there is no way to stop it doing that. Beside the fact that this is very bad style, it breaks the build for cases where $HOME is not writable (e.g. running it under a Jenkins instance'). in the attachment you find a fix for this issue, which boils down to having to explicitly specify the RESUME_FILE if the --resume option ought to be used. let me know if you have any objections.
First of all, the statement is untrue: You can specify the resume file in your config.opts (which I actually do). Secondly, the original behavior was to write the dune.resume file to the directory from which dunecontrol was started instead of the user's home directory.
Of course, allowing the resume file to be specified on the command line is a good idea. The same holds for the possibility do allow a resume file not to be used.
well, it's true that you can actually set the RESUME_FILE to /dev/null in the opts file, but I think the default behavior should not be surprising, i.e. not write anything outside the build directories.
I don't know about trunk, but in dune 2.1 duncontrol reads on line 480
create_module_list() {
export RESUME_FILE="
HOME/.dune.resume" if test "x
DUNE_OPTS_FILE" != "x"; then
export RESUME_FILE="$(eval . $DUNE_OPTS_FILE; eval echo $RESUME_FILE)"
fi
which means that it does write to the home directory by default (in conjunction with lines 137 and 155 where some something is written to $RESUME_FILE
I forgot to mention this in my last comment: the build also fails if RESUME_FILE cannot be written to. that's due to the cat $RESUME_FILE on line 150 which causes dunecontrol to stop after the first module. this code actually not really required for the build process to succeed, though...
The patch looks basically fine. I'm not sure, however, whether test -n / test -s is portable. Maybe someone with more experience with portable shell programming could have a look.
With respect to my comment: The path to the resume file was changed to $HOME/dune.resume because some guys thought their dune build directory should not be writeable. The possibility to read the resume file name from the opts file was introduced later.
Anyway, I think your idea is definitely an improvement. However, after applying your patch, users might be surprised that the resume feature does not work by default. Maybe we should print a warning in this case?
I disagree with the statement that programs shouldn't write files into the home directory. Many programs store their configuration or their state there. Consider for instance ~/.bashrc and ~/.bash_history. Of course, if a program does this, it should be a dot file, the name of the file should be related to the program, and it should handle write failures gracefully.
The only part that is missing is handling write failures gracefully. I'd suggest to simply ignore a failure to write to ~/.dune.resume by default. The information in that file is non-essential. Or make it a warning. Or introduce a --verbose flag to dunecontrol, and show the warning only with this flag. Or introduce an option --checkpoint to dunecontrol that makes it an error if ~/.dune.resume can't be written.
I agree with Joe that there is nothing in principal wrong with writing a file into the home directory - I have an impressive amount
of files starting with a dot there, none of which I wrote myself.
That does not mean that writing .dune.resume there, is optimal. It is more of a temporary file than a configuration file (as most of the other files in the home directory are). So having .dune.resume in the build directory does seem to make sense to me.
I agree with Andreas ;). I do not expect a build process to write anything into my home directory, because there shouldn't be any configuration files which are unrelated to the build (if there are, the config file should be created by the user IMHO). Thus, I rather expect temporary files go to /tmp. Question: how can the build directories be non-writable? I see the point for the source directories in out-of-tree builds, but not for the build directories (at least not those of the modules where object files need to be written to). Hm, maybe the resume file should be put into the build directory of dune-common?
on the matter of printing an error: I hope that the patch takes care of this, it will output a "RESUME_FILE is undefined" message if dunecontrol is run with --resume and no resume file could be determined. I did not test this, though...
.dune.resume is not a configuration file, but neither is it a temporary file. It is a file carrying program state from one run of a program to the next. System services like daemons would keep files like this somewhere in the /var hierarchy.
There is also another reason not to put .dune.resume into /tmp: security. Since /tmp is world-writable there are all kinds of funny races and attacks possible. And since the resume file persists across program runs, the standard techniques for temporary files don't work here. We don't want to open that can of worms.
I disagree with the statement that programs shouldn't write files into the home directory. Many programs store their configuration or their state there. Consider for instance ~/.bashrc and ~/.bash_history.
To my knowledge, no program (including bash) writes to ~/.bashrc.
The fact that bash does write to ~/.bash_history yields problems, too, if you run multiple instances...
Maybe we can come back to reason and not just name 100.000 programs that write into the home directory.
I have a question: Does anybody configure dune without an opts file? If not, would it be asked too much to simply add the RESUME_FILE='filename' to that file and the empty string simply deactivates this feature?
I think the discussion on the default is not rather void. I could even live with an arror message, if the RESUME_FILE is not set in my optsfile.
In any case, the possibility to disable writing the resume file is definitely a plus.
I see a problem in forcing people to set an additional option for a feature hardly anybody uses at the moment. I would propose that the resume file is deactivated by default and is only activated if an appropriate option is set (wether on the command line of dunecontrol or in the opts file I don't care).
.dune.resume is not a configuration file, but neither is it a temporary file.
It is a file carrying program state from one run of a program to the next.
that's exactly what I would consider a temporary file for a single build which requires multiple invocations of duncontrol...
I also don't argue against programs writing to the home directory I argue against build systems writing there. As far as I know there are no other build systems which do that...
I'm OK with writing the .dune.resume file into the home directory by
default, as long as failures aren't fatal.
I'm OK with writing no resume file per default (requiring it to be switched
on explicitly, either via the opts-file, the environment, or a commandline
option).
I'm against writing it to /tmp, unless explicitly commanded to.
I'm undecided about writing it to dunecontrol's cwd by default. That may
have its own set of problems which would need further consideration.
@Jö: this is my personal gut feeling, so you may have a different one:
For me, build systems are there to translate source code into executable programs. I expect them to have no side effects. (in a sense they can be seen as a function B=f(S) which produces binary code B from source code S and writes all intermediate results to a well defined scratch directory.)
On the other hand, programs, and especially interactive programs have persistent parameters, which means that for me it is okay if they store these in my home directory. I don't want to re-specify them again and again, right? If you argue that cached data are not persistent parameters but is often stored in the home directory anyway, I totally agree. IMHO caches should also be located elsewhere because (browser) caches in the home directory are a major headache if the file system for the home directory enforces quotas and/or is a network file system.
I hope that this is similar to your own gut feeling. Anyway, olaf's comment that it's okay having to specify an parameter explicitly to use obscure functionality expresses my resentments about ~/.dune.resume quite well
@Andreas Lauser: I still don't really see the difference. There a also lots of mainly non-interactive programs that do keep state.
If you want a build-system related example, take ccache. It wraps the compiler and caches the compiled programs. The only place where it can put its cache by default is the user's home. The only potential alternative that it reasonably expect to be writable would be /tmp (security considerations aside). But on many systems /tmp gets deleted on system reboot, or old files are removed after a few days. So /tmp is not a good place to keep the cached data files -- their loss may not be fatal, but it would make ccache kind of pointless.
While it may be theoretically nice and clean to consider a compiler or any other part of a build system a function with no side effects, it is too limiting in practice. Keeping state will allow the build system to save time. For instance by caching. Or by restarting from the last point instead of restarting from scratch (our famous resume feature). Or by remembering what the best optimization parameters were last time, and only validating them instead of redetermining them with no prior knowledge.
The main thing that I see that differentiates those programs that write into ~ and those who don't is whether they have a need to do so or not. Interactive and especially GUI programs tend to define "need" rather loosely, that's why they have so many dot-files in ~. (Also, the distinction between configuration and state isn't all that clear there.) Most of the nice things I've listed in the previous paragraph aren't implemented in today's build systems, so they generally have no need for state in ~. But to forbid them to do so in the future is an undue limitation of their development potential.
[No, I'm not going to claim legal rights for compilers. I'm just being totally selfish here as a programmer, and don't want to remove the possibility of a better compiler.]
What about when I have multiple build systems? There should be a provision to create the dune-resume directory in which files can be appropriately created according to the build system called.
Apart from that, cc is a program which uses ccache. The point is that the version of the programs called is the same wherever in the home directory is the program called and the information used to run the program does not change much in different instances of calling the programs. Hence, having a universal file makes sense. Here, even if I agree that the file has to be created, I believe that it should be properly described to prevent conflicts.
Well, it seems that using ccache requires explicit user intervention so the side-effects are not unexpected. Further, the home directory can not always be assumed to be writable (see my original motivation for this patch).
Anyway, let's briefly step backwards and look at this from a more global angle: Would you like to have your home directory cluttered with random build-related files written by the 500+ packages you compile if you are on gentoo or LFS? I would not like this...
Regardless of what you think about this issue, I think that the proposed patch is not very intrusive. It does not make a difference besides not writing ~/.dune.resume (and not failing if it cannot be written to) if you do not intend to use --resume; On the other hand if you want it, IMHO it is not asked to much to explicitly specify RESUME_FILE.
@Jayesh: All the solutions discussion to far allow to specify or override the resume file used in the opts file or by some other means. That should also cover the case of having multiple dune trees (e.g. one in ~/dune-2.1 and one in ~/dune-trunk or something similar; I suppose that was what you meant by "multiple build systems").
@Andreas Lauser: Looking at your patch, I don't see any problems with it, technically. There are some issues with whitespace in $RESUME_FILE, or $RESUME_FILE beginning with "-e ...", but they were not introduced by the patch. I haven't tested the patch though and since the reason for the patch is rather special and a workaround exists, I wont invest the time to actually test it. Therefore it wont be me who applies the patch.
I also don't think the changed default behaviour is a big problem (though I don't see a reason to change the default, as discussed previously). It's worth a news entry, though.
@Elias: the problem of multiple writers to the same file is pretty much orthogonal to the question of the location of the file.
@joe: Not at all. Because of the autogen phase it is not currently possible to run multiple instances of dunecontrol on the same source (the phases after that only affect the directory given through --builddir but the autogen phase does not). It is possible to run multiple instances of dunecontrol if you have multiple source trees, however. If dunecontrol now records a global state that is in no way tied to the invocation (e.g. through a PID), that becomes a problem, solely because of the location of the file.
@elias: correct. Though in the current dunecontrol implementation this just affects the corner case of using dunecontrol's --resume option in conjunction with concurrent builds
@Elias: Yes, you will get a problem when you run instances of dunecontrol
concurrently for different build trees, and then try to resume from the
resulting .dune.resume. I never claimed otherwise.
My point was that there is no straightforward way for dunecontrol to have
different default .dune.resume files for different build trees, since there is
no way to automatically identify what a build tree actually is. Therefore the
only reasonable options are
a) have one default .dune.resume in a fixed location,
b) have a .dune.resume in the current directory by default,
c) have no .dune.resume by default.
If you chose a) you have the problem of concurrent build, no matter where
exactly you put that file (this was what I meant that the question of the
location and the problem of concurrency are orthogonal). Choosing b) may be a
bit better but it gives you no guarantee the concurrency problem is avoided.
The location of the modules can be given via the opts-file or the environment,
so you can run dunecontrol two times for different build trees from the same
directory.
Hm, okay it seems like you don't like the patch. how about a clean solution as in the attached patch? (for how it works, please see the description in the patch)
I checked that duncontrol --resume and that dunecontrol --resume --skipfirst works as advertised, but cannot test it on 5+ year old environments.
Oh, I did like your old patch, I just didn't like some of the claims made in
the reasoning (and those claims weren't even necessary to make the point).
There are two reasons why I wont apply the patch (or any patch to the resume
feature):
I don't use the resume feature. I don't feel qualified to make the
decision whether to change the behavior on behalf of those who do use the
feature. This reason of course goes away when at least one other resume
user besides you speaks in favour (or it turns out that you are the only
resume user) and noone speaks against.
The second reason is a very selfish one. Very often such patches turn out
to cause problems in some corner cases, and then it usually falls to the
committer to debug this. Since I don't see the big benefit, besides
aesthetic benefits, I'm not prepared to make that commitment in this
case. I do have a thesis to finish.
These are just reasons why I won't apply the patch, I won't object if
someone else applies it.
Regarding your new patch: I like it even better. It's beautiful concept. It
makes explicitly specifying a resume file completely superfluous. (But I
haven't looked at the actual code to see whether there are any bugs, nor have
I actually tested it.)
Actually I also don't use --resume. I just want to get rid of the dune.resume in my home directory ;)
For the second patch could you please test the following, since you seem to have old software around:
./dune-common/bin/dunecontrol --module=dune-grid all
hit Control+C when it is in dune-grid
./dune-common/bin/dunecontrol --module=dune-grid --resume --skipfirst all
should be finished immediately
./dune-common/bin/dunecontrol --module=dune-grid --resume all
should restart in dune-grid
Overall this is going to take approximately five minutes. If it works, everything is probably fine. If you don't want to take the burden of committing the patch, could someone else with commit rights do? I promise that I will deal with the resulting bug reports...
As I seem to be one of the few people who actually use the resume feature, I just wanted to test your initial patch (I'm not a bash expert, so I cannot debug the new patch). Anyway, I somehow failed to apply the patch to the current trunk of dune-common. Andreas, could you provide the correct patch command for this?
Hi Martin! thanks for getting this out of the world, I guess I will buy you a beer next time we meet.
The patches were against dune-2.1 and it seems like somebody did a huge re-write of dunecontrol in the meantime. I've forward ported and tested the first to trunk r6515 (see attachment).
During that, I've discovered another surprising semantics in the resume functionality: If dunecontrol exits while handling the first module (e.g. dune-common) and it is called using --resume the next time, this first module is skipped (which I would expect to only occur if you also specify --skipfirst). If any "later" module fails, --resume works as expected. Anyway, since I normally don't use --resume I do not care...
Ok, I applied Andreas' latest patch. Now, you have to specify a resume file in your config.opts (or in the environment) to make the resume feature work.
If someone encounters any trouble, please revert the patch (if possible) and add a comment.
I will leave this task open for another week. After that I will, if noone objects, announce this change in the recent changes and close this task.