Although we at LH2 are generally really happy with the solvers of Dune-ISTL, we would like to use a different convergence criterion than the reduction of the two-norm of the residual. I've prepared a proof-of patch which makes the criterion pluggable and did some measurements with Dumux. In the end, the whole simulation was about 25 % faster (see the attached logfiles).
note, that this patch is currently more a proof of concept against dune 2.1 and I would appreciate some feedback. the patch description is the following:
it works by adding a template argument 'ConvergenceCriterion' to thesolver. The default criterion is to look at the reduction of thedefect for the current iterative solution. This is the behaviour ofthe ISTL solvers before this patch, so no code needs to be changed ifthe current behaviour ought to be kept. The patch adds a secondcriterion which checks whether the weighted difference between twoiterations is below a given threshold. This second criterion is called'FixPointCriterion' and shows better performance (at least for Dumux,where the Newton-Raphson solver uses a relative convergence criterion).So far, only the BiCGStab solver has been modified. If this patch hasany chance of getting merged into the Dune trunk, I will also modifythe other solvers.
Hi Andreas, why is it a template parameter? The selection of the proper criterion only happens once after each iteration, and is not time critical. Dynamic polymorphism may make the code more readable.
Thanks for the comment. Basically it's a template parameter because I did not want to change the API at all for people who are happy with the reduction of the residual. The problem is, that if dynamic polymorphism ought to be used, the convergence criterion object cannot be instantiated inside the solver, so either the constructor or the solve() method would need an additional parameter.
If this is acceptable and desired I am happy to provide second patch using dynamic polymorphism.
I like the idea of having exchangable criteria but I don't like the proposes solution for the following reasons:
My main concern is that introducing of a general concept of convergence criteria while not having a concept for iterative solvers will not lead to a clean and flexible design. For example the proposed solution is still very restrictive and a general solution would IMHO need access to solver specific data. There is e.g. a criterion for the cg method that estimates |u-u_k|_A using quantities already computed by the cg-step (see Deuflhard/Weiser). Although not being really flexible the proposed solution introduces ten new interface methods.
A flexible criterion class could e.g. get the solver or an IterartionState object as base class pointer and ask it for specific data using a dynamic_cast to derived classes. This would allow to implement generic and solver specific criteria without blowing up the interface.
Since this solution would be far more invasive I do not propose to introduce it now. In order to implement the criterion that you need it might be clearer to just hardwire it additionally instead of introducing a new concept that is not satisfactory in the long run.
I agree that the proposed API is quite ad-hoc and that it would be better to have a general concept, but I would like to use a fix-point criterion now because I get major performance and minor stability improvements by using it. (Baking it directly into the solvers, would IMHO obfuscate the solvers code quite a bit and would lead to a lot of code duplication.)
If you think it is better than the proposed interface, I will modify the patch so that the update() methods also get the solver object as an argument. Since there are currently no "IterationState" objects, passing solver specific data in a generic way is a problem in the current ISTL implementation. Though it would be possible to define a solver-specific API (e.g. CGConvergenceCriterion) and to provide a wrapper to a "generic" API for each solver.
Finally, and only tangentially related to this discussion, it would be nice if it was possible to declare APIs as experimental which could be defined as "may break without notice".
This is a port of the patch to dune 2.2. In addition, the fix-point criterion now also works in parallel and the CG and Loop solvers have been adapted as well.
The next version of the patch, this time no template arguments are involved anymore and the API uses dynamic polymorphism, as requested. Would someone who has commit rights please be so kind to either push it into trunk (with prior discussion if necessary) or tell me that this whole approach is bullshit and I am the only one who has this problem? Thanks.
First of all: Your patch is not clean. You are also patching repartition.hh, which has nothing to do with the convergence criterion.
Besides that thanks a lot for incorporating Carstens suggestions, it looks much better now.
I am not an expert on convergence criterias (which is a shame) and had only a short glimpse at your patch, but shouldn't there be the possibility to use the operator in the criterion? Furthermore I think some criteria work with the preconditioned residuum. Woud it be possible to add that?
Anyway I think adding plugable criteria would be a good thing and, assuming that you now adapapted all solvers, your effort is a good start.
Unfortunately, my time is currently limited and it might take until midth of September to sorrowly review your approach and apply it
First of all: Your patch is not clean. You are also patching repartition.hh, which has nothing to do with the convergence criterion.
oops. that was a mistake. I simply commited everything in my working copy and forgot that I had non-related changes in it as well. The attached patch fixes this.
I am not an expert on convergence criterias (which is a shame) and had only a short glimpse at your patch, but shouldn't there be the possibility to use the operator in the criterion?
I think C++ does not allow operators to be virtual. (this could be worked-around by calling a virtual function within the operator, though.) The main reason why I also don't like this is, that I don't see how using an operator makes this makes the patch simpler or the solvers more readable, since one would require to introduce the update() and setInitial() methods as well because folding them into e.g. the smaller-than operator would decrease efficiency when checking for convergence of the same solution more than once.
Anyway I think adding plugable criteria would be a good thing and, assuming that you now adapapted all solvers, your effort is a good start.
that's already done by the attached patch. also, the accuracy is printed in all solvers instead of the reduction of the two-norm of the residual. (this makes the data which the user sees the same as the one the solvers use).
You might want to compute the error reduction in the energy-norm sqrt{u_k^T A u_k / u_{k-1}^T A u_{k-1}} if your right hand side is zerro and u_k is the last current guess.
You might want to compute the error reduction in the energy-norm sqrt{u_k^T A u_k / u_{k-1}^T A u_{k-1}} if your right hand side is zerro and u_k is the last current guess.
implementing this would not be a major issue with the approach taken by the patch; But bear in mind that my motivation for this patch was not that the solvers did not converge, rather that I could not make the two-norm residual reduction of the linear solver play nicely with the fixpoint criterion which I use for my newton iteration. (There, I look at the weighted maximum of the difference between two iterations, but that is only marginally related to the reduction of the two norm for large grids or if the conservation quantities or the quantities which you solve for are on a different scale [think on energy vs. mass or pressure vs. saturation]. This lead the newton solver to oscillate or -- even worse -- to converge to an incorrect value.)
I don't think the enerynorm is as easy as you suggest. The operator is something you can't pass from outside, as it might never be accessible to the user, e.g. when using a newton solver.
I think Markus question is still valid and I consider it natural to also pass the operator to the criterion. During the update we pass the solution and the residual, so why not the operator as well?
I don't understand why we have two initialize and update methods, one with the two_norm and one without. The later is never used. Furthremore it is not clear to me how this should work!? If we ommit the two_norm, but need it in the criterion, how should the criterion compute the two_norm without duplicating many information like the scalar-product operator.
the current implementation of ResidReductionCriterion::update will run in parallel, but compute different results if we pass the two_norm, or ommit it.
the patch uses std::shared_ptr, it should use Dune::shared_ptr instead.
So I think there are still some open questions and smaller issues. Thus I am against an immediate incorporation.
I don't think the enerynorm is as easy as you suggest. The operator is something you can't pass from outside, as it might never be accessible to the user, e.g. when using a newton solver.
it is as easy as I said and it is also possible to pass the operator:
auto *convCrit = new EnergyNormCriterion(linearOperator);
solver->setConvergenceCriterion(std::shared_ptrDune::ConvergenceCriterion(convCrit));
I had the same issue with setting the weights already...
I think Markus question is still valid and I consider it natural to also pass the operator to the criterion. During the update we pass the solution and the residual, so why not the operator as well?
true, but the operator can be passed to the constructor. If the criterion needs other data it can be passed this way as well.
I don't understand why we have two initialize and update methods, one with the two_norm and one without. The later is never used.
well, its an efficiency and convenience issue: If the two_norm of the residual is conveniently available, the solver may pass it to the convergence criterion, if it's not available, the convergence criterion needs to calculate it itself (but only if it requires it). It is true that one of these two methods could be removed, but then either solvers that don't need the two norm of the residuum always have to calculate it before calling the convergence criterion even if the solver does not need it or the convergence criterion has to re-calculate it even if it is readily available in the solver. For e.g. the loop solver, the two-norm of the residuum is not really required, so it could call the methods without the two-norm, but that's an optimization left for the time after the basic infrastructure is merged.
Furthremore it is not clear to me how this should work!? If we ommit the two_norm, but need it in the criterion, how should the criterion compute the two_norm without duplicating many information like the scalar-product operator.
pass the additional data to the constructor of the convergence criterion just as it is done for the solvers...
the current implementation of ResidReductionCriterion::update will run in parallel, but compute different results if we pass the two_norm, or ommit it.
really? that's a bug then. I will investigate this.
the patch uses std::shared_ptr, it should use Dune::shared_ptr instead.
true, but the operator can be passed to the constructor. If the criterion needs other data it can be passed this way as well.
no, you can not pass the operator to the criterion when using a newton scheme. The operator changes during the newton algorithm, so you have to update it.
if it's not available, the convergence criterion needs to calculate it itself (but only if it requires it).
as I said, it can't easily compute it for itself and your implementation is doing it wrong.
really? that's a bug then. I will investigate this.
you have to use the scalar product to compute the global two_norm. istl knows only local matrices and vectors. And thus this is something I prefer having outside the convergence criterion.
true, but the operator can be passed to the constructor. If the criterion needs other data it can be passed this way as well.
no, you can not pass the operator to the criterion when using a newton scheme. The operator changes during the newton algorithm, so you have to update it.
yes you can! either create the linear operator for every newton iteration before you solve your tangential space or pass a reference to the convergence criterion. For my implementation of newton's method I picked the first approach, primary because creating the linear operator out of a BCRS matrix is very cheap in ISTL...
really? that's a bug then. I will investigate this.
you have to use the scalar product to compute the global two_norm. istl knows only local matrices and vectors. And thus this is something I prefer having outside the convergence criterion.
I know, but I was ignorant of this when I first created the patch. I will send an updated patch which uses the same scalar product as the linear solver tomorrow morning.
yes you can! either create the linear operator for every newton iteration before you solve your tangential space or pass a reference to the convergence criterion. For my implementation of newton's method I picked the first approach, primary because creating the linear operator out of a BCRS matrix is very cheap in ISTL...
This will be my last post on this topic, I share Markus POV. If you write a generic Newton algorithm this is not possible, because the user chooses the criterion, so the Newton can't recreate it, it can only be updated, but then you can't update the operator. There will always be ways to trick your code around this (e.g. using shared pointers and external memory for the matrix), but all this is not convenient for the user and I don't understand your opposition against Markus proposal.
I know, but I was ignorant of this when I first created the patch. I will send an updated patch which uses the same scalar product as the linear solver tomorrow morning.
Is this your honest answer? If this your way of handling code quality, I'm very surprised/disappointed. Having bugs in the code... not nice, but shit happens -- leaving a feature out in the beginning... OK, if you clearly state it, e.g. by an exception -- but implementing something deliberately wrong is a real no-go.
This will be my last post on this topic, I share Markus POV. If you write a generic Newton algorithm this is not possible
I told you that it is possible in my last answer! you pass a reference to the scalar product and the operator at the time you create the solver (so the operator and scalar products are guaranteed to be available) and you may modify it to your liking between two consecutive calls to solver.apply(). the convergence criteria will use exactly the same object as the solver. does this answer your concern?
understand your opposition against Markus proposal.
putting this into the operator would be a violation of abstractions. AFAIK operators provide matrix-vector multiplications, but IMHO they are not supposed to know when a linear solver converged.
Is this your honest answer?
yes it is, I've simply forgot that ISTL uses their own scalar product abstractions when I wrote this. Not everybody produces as perfect code as you do on the first try!
If this your way of handling code quality, I'm very surprised/disappointed. Having bugs in the code... not nice, but shit happens -- leaving a feature out in the beginning... OK, if you clearly state it, e.g. by an exception -- but implementing something deliberately wrong is a real no-go.
I think you mis-interpreted this statement quite considerably. Anyway, in the attachment you will find the patch with all the issues you raised fixed and the energy norm reduction criterion implemented. I've also verified that it compiles but I'm not sure whether it works because I always have well-defined linear systems, so if the right hand side is 0 for me, the solution is as well. note that implementing the energy norm reduction criterion took about 10 minutes. If you don't come up with new nit-picks I would like you to merge it quickly.
I forgot if you need a super-special convergence criterion (So far you did not, because you used the two-norm residual reduction), you can still keep a pointer to the actual criterion around in the newton method, while the solver only needs a pointer to the base class.
Okay, sorry for the dog fight, I should not have let myself be provoked that easily. Since nobody seems to be interested in this (which I'm a bit surprised about, given the fact that I regularly see speedups north of 30% for the linear solver if using the WeightedResidualReduction criterion, but I guess "c'est la vive"), I guess that I will have to maintain this patch outside of Dune indefinitely. I will keep posting new versions of it here if there are no objections, though.
In the attachment, you'll find the latest version which features two changes: first, it seems that grids sometimes use a collective communication other than Dune::CollectiveCommunicationDune::MPIHelper::MPICommunicator which breaks things when using the previous patch. Since I don't know what the grid interface requires, the new patch is playing safe and adds a template argument for the CollectiveCommunication to the criteria that need it. The second change is, that I've recently had a closer look at solvers.hh and noticed that none of the those solvers used the norm of the scalar product for anything useful other than passing it to the convergence criterion. For this reason, I removed the methods which got the two-norm of the defect as their third argument and also removed the superfluous variables from the solvers. This means that convergence criteria that don't look at the two-norm of the residual are a tad faster now because one communication per iteration is saved.