[draft][AD] Add some automated differentiation utilities
This provide the ADValue<T,maxOrder,dim>
class as a primitive
for automatic differentiation. It that encapsulates a scalar type
T
(e.g. T=double
) and provides arithmetic operations.
Each ADValue
is considered as an intermediate result in
the evaluation of a scalar dim
-variate function but does
not store the value only, but the jet of all derivatives up
to maxOrder
. Currently only maxOrder<=2
is implemented.
Additionally, this provides some convenience utilities
for using this ADValue
:
- A functions for easy definition of AD-aware nonlinear differentiable functions.
- Overloads for a few basic functions (
abs
,sin
,cos
,log
,exp
,sqrt
,pow
). - Some glue code to implements dune-functions differentiable
functions (for a callback
f
simply useADFunction<f>
.
Usage example:
using std::sin;
using std::exp;
using std::pow;
using namespace Dune::Indices;
// Define and initialize functions arguments of a tri-variate
// twice differentiable function.
auto x0 = ADValue<double,2,3>(23., 0);
auto x1 = ADValue<double,2,3>(42., 1);
auto x2 = ADValue<double,2,3>(13., 2);
// Evaluate expression
auto y = pow(2, exp(sin(x[0]*x[1])*sin(x[2]) + 3));
// Extract value of function, 1st and 2nd order partial derivatives
y.partial();
y.partial(i);
y.partial(j);
Why propose this despite the fact we have AdolC bindings?
- Disadvantages of AdolC tape-based mode:
- This relyies on global variables and is not thread safe.
- It is much slower. For simple expressions I measured, that
it is between 10 and 100 times slower than
ADValue
. When using AD for the derivatives of the energy with a Newton method for the minimal surface equation, assembly is takes about 20 times as long with AdolC compared toADValue
. The latter is essentially as fast as with manually implemented derivatives and evenmore allows for multi-threading (not used in the comparison).
- AFAIK
ADValue
can be characterized as tapeless forward mode Taylor polynomial AD. - AdolC also has a tape-less forward mode, which is probably
similar to
ADValue
but has some conceptual restrictions:- Only 1st order derivatives are available, so this can't be used for Newton's method.
- You have to decide in advance on the maximal domain dimension
and set it using a
macroglobal variable. This influences the memory consumption of all AD-values stored later on. - There's also no guarantee about thread-safety.
Merge request reports
Activity
As another argument for proposing this is that the implementation is really simple and took me far less time than building the dune-functions-like interface to AdolC or to fix some simple issues in AdolC.
The core functionality here is just about 350 straight forward loc for
ADValues
. Each supported nonlinear function need a few additional trivial line.I'd still prefer that ADOL-C would be improved rather than writing yet-another-tapeless-AD code.
I totally agree. But I had a look at the thread-safety issue in AdolC's tapeless mode, and this far from being obvious. I guess that the time I invested there is roughly the same it took to implement
ADValue
. And the latter is by far faster but probably has a different scope.Understanding and fixing ADOL-C is not difficult for somebody of your expertise, and many more people would benefit from it.
This would maybe apply to the tapeless forward mode. But fixing the mentioned shortcomings requires that the domain dimension is a template parameter of the AD value type. For other reasons, I think that the wrapped type should also be a template parameter and not be fixed to double. Both cannot be implemented for AdolC's
adouble
in a compatible way. Hence fixing this, would mean to add a new class template likeADvalue
to AdolC i.e. to propose to addADValue
.I had a look at AdolC's
adtl::adouble
. This indeed works quite similar but has a dynamic dimension. There, local allocations are (if enabled) performed usingboost::pool<>
with a globally set block-size and a non-thread-safe pool.In order to be on par with
adtl::adouble
you can now useADValue<T,k,dynamicDim>
and set the dimension dynamically. In this case we can use either usestd::vector<>
orboost::pool<>
for runtime sized storage. In my tests the preformance of both is essentially on par with theiradtl::adouble
analouges, such thatADValue<double,1,dynamicDim>
could replaceadtl::adouble
.Additionally to this
ADValue
provides:- Templated underlying type instead of fixed
double
. - Second order derivatives (the interface allows for later extension to higher order).
- Dimension can be a compile time parameter. In my tests this is about twice as fast as
adtl::adouble
with dynamic size and pool allocation and much faster than the non-pool version or (even worse) the tape based mode. - Thread safety:
- With compile-time size
ADValue
only works on the stack and is intrinsically safe in the sense that you can use different values in different threads simultaneously. - With dynamic size and slow
std::vector
storage you get the same guarantees. - With dynamic size and
boost:pool
allocation, you are safe as long as you don't shareADValue
contexts across threads, i.e., when creating anADValue
in thread A, you must not call any operation on it (not even a copy) in another thread.
- With compile-time size
- Templated underlying type instead of fixed
I also stripped down
ADValue
from almost all dune-isms.@oliver.sander: Upstreaming this to an AD library may indeed make sense. As an AdolC developer, do you think that proposing
ADValue
for AdolC is reasonable?On the other hand this is just a lightweight single-file header-only utility that one may want to use without an AdolC dependency. E.g. since there is no longer a strong performance impact I'd like to use this in the dune-fufem examples and maybe integrate it with
Dune::Fufem::Forms
, but would like to avoid having a recent AdolC as mandatory dependency.Edited by Carsten GräserOK, I'll give it a try. There's one thing I don't like here: In the dynamic mode this currently stores a static vector of
boost::pool
's in athread_local
singleton which adds some thread-safety. This design basically results from trying to stay close to theAdolC
design.A much cleaner version would be to setup something like an
ADContext<T, maxOrder, dynamicDim>
which creates dynamically sizedADValues
of a single size. ThisADContext
would store the pool such that we can get away without global or thread-local storage and the thread-safety guarantee could be much easier described.But this seems to be different to the
AdolC
philosophy to use global storage such that people don't need to care who actually holds the storage.
While being fully functional, this is marked as draft, because one may want to improve a few things:
- As far
AdolC
this requires glue-code to transfer vectors and matrices to and from theADValue
representation to provide aDifferentiableFunction
interface. Currently this is implemented for a few cases only. - Ideally, one should be able to parameterize the
DerivativeTraits
for theDifferentiableFunction
wrapper. - There's no test.
One can discuss that this could be added to dune-functions, or,
ADValue
may even be suitable for dune-common. But currently this is just the result of one afternoon and probably needs some more improvements.Possibly further generalizations:
-
Allow to generalize this to dynamic domain dimensions (e.g. by adding support for a dynamic-extent value as(implemented).dim
) - Allow to customization the stored gradient and Hessian types, to avoid copies when wrapping as
DifferentiableFunction
.
Edited by Carsten Gräser- As far
Using this to implement a Newton method for the minimal surface equation is as easy as:
auto phi = Dune::Fufem::AD::ADFunction([](auto g) { using std::sqrt; return sqrt(1 + g.frobenius_norm2()); }); auto Dphi = RangeOperator(derivative(phi)); auto DDphi = RangeOperator(derivative(derivative(phi))); ... auto u_old = bindToCoefficients(trialFunction(basis), sol); auto u = trialFunction(basis); auto v = testFunction(basis); auto F = integrate(dot(Dphi(grad(u_old)), grad(v))); auto DF = integrate(dot(DDphi(grad(u_old))*grad(u), grad(v)));
and is about 20 times faster than doing the same with AdolC tape-based mode.
Edited by Carsten Gräseradded 1 commit
- 057ab919 - [AD] Add some automated differentiation utilities
added 1 commit
- 46507b7f - [AD] Split ADValue and add support for boost::pool<>
added 1 commit
- 4e24ae05 - [AD] Split ADValue and add support for boost::pool<>
- dune/fufem/functions/advalue.hh 0 → 100644
17 18 #include <dune/common/rangeutilities.hh> 19 20 21 22 namespace Dune::Fufem::AD { 23 24 25 26 /** 27 * \brief Special value to indicate a dynamically selected dimension 28 * 29 * This value can be passed as dimension template parameter to `ADValue` 30 * to indicate that the dimension should be selected dynamically. 31 */ 32 inline constexpr int dynamicDim = -1; C++20 will have
std::dynamic_extent
for this, apparently hardwired tostd::numeric_limits<std::size_t>::max()
: https://en.cppreference.com/w/cpp/container/span/dynamic_extentShould we maybe use that value to prepare to eventually adopt
std::dynamic_extent
here?changed this line in version 8 of the diff
@oliver.sander Maybe interesting for your nonlinear mechanics assembly-code: I also tried the proposed trace less AD within
adolcfunctiontest.cc
where a local assembler for the Laplacian is generated as the AD-Hessian of the Dirichlet-energy (wrt. a local coefficient vector). To my surprise the local assembler is about 5 times faster (Laplace,YaspGrid<3>
, Q1). In general the speedup is larger for small local dimension.added 85 commits
-
c6516476...27d7517f - 78 commits from branch
master
- 3bc946e5 - [AD] Add some automated differentiation utilities
- 98e2dd78 - [AD] Add support for dynamic dimension
- 91f5f32b - [AD] Split ADValue and add support for boost::pool<>
- c556221a - [AD] Improve handling of dynamic ADValue
- 7529ec94 - [AD] Make ADValue robust wrt nesting
- e46134a0 - [ad] Update to recent version in proposal for adolc
- 78e1b0a6 - [AD] Add some patches to make ADValue work with boost::numeric::interval
Toggle commit list-
c6516476...27d7517f - 78 commits from branch