On the meeting it was decided to have mixture of static and virtual wrapper
Description
Introduce dynamic polymorphism with virtual functions for the FiniteElement classes to allow fe-spaces with mixed element types. Currently you have to select the FE-Type statically but the GeometryType of the grid elements is only available dynamically.
This request includes the introduction of an interpolate() method for a fixed functor-type virtual base class since template members can't be virtual.
Designs
Child items ...
Show closed items
Linked items 0
Link issues together to show that they're related.
Learn more.
Hi Peter!
I still think that virtual methods are much easier to use than
template polymorphism. I'll try to do some testing to know more
about the impact on efficiency. (You may want to read the article
on devirtualization in the 2006 gcc summit proceedings).
The final decision should be made by the entire team at our
next meeting.
Greetings,
Oliver
Peter Bastian schrieb:
Hi Oliver,
there is one problem with the virtual functions. The idea with the
recursive definition of CkLocalFunctionInterface we had at the meeting
did not work out. Currently the method evaluate is a template method and
thus cannot be virtual. Of course for testing it is sufficent to make
evaluateFunction in C0LocalFunctionInterface virtual. If virtual
functions are desired we have to think once more.
Regards,
Peter
Am Dienstag, den 04.11.2008, 11:25 +0100 schrieb Oliver Sander:
Hi Peter!
Thanks a lot. As agreed upon at our last meeting I will migrate the tests
from dune-disc. I will also test an interface using virtual functions
and try to estimate its impact. This will give us some facts when the final
interface will be discussed.
--
Oliver
Peter Bastian schrieb:
Hi developers,
I prepared the new module dune-localfunctions as discussed ot the
meeting on 06 October (up to some details that gave problems). I added
the notes to the file dune_treffen_061008.txt.
You can now have a look at it and we can start discussion about it.
I still have to adapt my code to these new interfaces, so not everything
is checked so far.
If dune-localfunctions is supposed to be THE official shapefunction module it should support mixed elements as dune-grid (especially UG) supports them. This could be added in two ways:
Add a FE class that wraps a statically fixed set of FiniteElements for different geometries and decides which should be used (internal) dynamically on every function call.
Use dynamic polymorphism with a virtual base class.
In my opinion 2) is a much cleaner solution. And if the type is known at compile time this should be optimized away anyway.
"should" is always a weak argument. Option (1) is the solution proposed by Peter. Oliver offered to provide some time measurements concerning virtual functions.
I think it necessary to bring some facts into the discussion before we continue.
Supporting mixed elements is a must. The weakness of "should" only applies to the optimization. However saying virtual functions are slower is also a weak argument since it is simply not true in general.
Another solution would be to use virtual functions and to provide a template that wraps them. Then you could use either the virtual base class or the wrapped derived class in the application and benefit from either flexibility or the lack of virtual methods.
I just like to repeat my argument: I do not see why the use of virtual functions needs to established in the interface. My idea was this: If your implementation is simple, e.g. one element type, one polynomial order, then there are no virtual functions. If your implementation is complicated, make a base class with virtual functions, say P1MultiElementTypeBase and derive from that your implementations for triangles and quads.
You propose to do it just the other way around: Hardwire the virtual functions in the interface and rely on some other quirk that removes them if you don't want them. I just don't see the point what is so bad with my solution (besides personal taste).
Besides all that I do not think that run-time is such a big issue there. It would be nice to see some numbers nevertheless. If they indicate that there is no difference, then let us use virtual functions. Period.
Writing an additional base class P1MultiElementTypeBase with virtual functions would require to explicitly clone or wrap each FE you want to use. Then one would in fact have two parallel class hierarchies one without and one with virtual functions.
On the other hand a static wrapper to hard wire the derived type (if you know that only one appears) can be done generically without introducing a parallel class hierarchy.
I added the switch --enable-virtualshapefunctions to dune-localfunctions to allow easy switching between a dynamic and a static hierarchy. My reason for this switch was not to give people the option forever but to allow easy run-time comparisons. I have not actually made such comparisons yet, though.
I did do some general comparisons on the efficiency loss due to virtual functions for our last meeting, and I remember the results being acceptable. I have to haunt the archives though before I can post them.
I'm not sure, whether everyone understands each others proposals correctly (I
certainly don't). In the following I'm summarizing what I undestand.
Please edit this comment if I misrepresent your proposal so everyone knows
what we're talking about.
Static-implementation proposal
First there is the proposal from Peter (a.k.a. "1)") which I understand as
follows:
Have a set of static base classes (StaticLocalFiniteElementInterface,
StaticLocalBasisInterface, StaticLocalCoefficientsInterface, and
StaticLocalInterpolationInterface). These base classes get the following
template parameters, as apropriate:
The C0LocalBasisTraits, C1LocalBasisTraits or CkLocalBasisTraits as
apropriate.
The interpolate() method (and the methods for my experimental global
interface) are template methods.
Have a library of static local finite elements derived from that (as it is
now), for the people who would like the static version.
Have a set of virtual base classes (VirtualLocalFiniteElementInterface,
VirtualLocalBasisInterface, VirtualLocalCoefficientsInterface, and
VirtualLocalInterpolationInterface). These base classes must have the
following template parameters, as apropriate:
The C0LocalBasisTraits, C1LocalBasisTraits or CkLocalBasisTraits as
apropriate,
The function type F which may be interpolated, and
The type C used for the coefficients.
(In addition, for the global interface I'm currently experimenting with,
the type of the geometry would also be required)
This means the the template methods which are currently in place would
move their template parameters to the class in the virtual interface.
Thus you need to specify the kind of function and the coefficient type
earlier, but that is just the price you have to pay with virtual
inheritance anyway. (Of course you can use virtual base classes for the
template parameters, so that is not really an issue).
Have a set of derived wrapper templates
VirtualLocalFiniteElement,
VirtualLocalBase,
VirtualLocalCoefficients, and
VirtualLocalInterpolation. Each of those
classes can be constructed with an instance of the underlying class.
That makes a total of 8 template classes you have to write so poeple can use
the virtual interface on top of the static interface.
Virtual-implementation proposal
Next we have Carstens proposal which I understand as follows:
A set of virtual base class templates VirtualLocalFiniteElementInterface,
VirtualLocalBasisInterface, VirtualLocalCoefficientsInterface, and
VirtualLocalInterpoaltionInterface. These base class must have the
following template parameters, as apropriate (same as the template
parameters from Peters proposal above):
The C0LocalBasisTraits, C1LocalBasisTraits or CkLocalBasisTraits as
apropriate,
The function type F which may be interpolated, and
The type C used for the coefficients.
(In addition, for the global interface I'm currently experimenting with,
the type of the geometry would also be required)
Every implementation of the local finite elements provides a set of
classes derived from those classes (VirtualLocalFiniteElement,
VirtualLocalBasis, VirtualLocalCoefficients, and
VirtualLocalInterpoaltion).
Have a set of static base classes (StaticLocalFiniteElementInterface,
StaticLocalBasisInterface, StaticLocalCoefficientsInterface, and
StaticLocalInterpolationInterface). These base classes get the following
template parameters, as apropriate:
The C0LocalBasisTraits, C1LocalBasisTraits or CkLocalBasisTraits as
apropriate,
The function type F which may be interpolated, and
The type C used for the coefficients.
(In addition, for the global interface I'm currently experimenting with,
the type of the geometry would also be required)
The interpolate() method (and the methods for my experimental global
interface) are non-template methods, since they cannot be template methods
in the underlying virtual classes anyway.
A set of static wrapper classes
(StaticLocalFiniteElement,
StaticLocalBasis,
StaticLocalCoefficients, and
StaticLocalInterpolation) which are derived
from the static interface classes above. Each wrapper class gets an
instance of its virtual pendant on construction.
That makes a total of 8 template classes you have to write so poeple can use
the static interface on top of the virtual interface.
Select-at-configure-time proposal
There also seems to be the possibility to switch between virtual and static
inheritance at compile time (like Oliver implemented it) but that was just for
perfornance testing and nobody seriously thinks about that.
This reflects my proposal. 'Til now I missed the fact that the static->dynamic wrapper (point 3.) can also be done generically. Taking this into acount both approaches are almost equivalent with the following exceptions:
A dynamic->static wrapper can not produce the interpolate() template if it's not there additionally to the virtual method (1:0 for static approach)
(Non-wrapped) Dynamic polymorphism lead to much more readable/understandable code and error messages. (1:1)
The virtual interpolate() does not cost to much (???)
Virtual functions get optimized away (???)
For me even 2) alone is a very good argument and 2)+3) are enough to switch to dynamic polymorphism.
For the static-implementation proposal I see the following advantages:
The virtual-implementation proposal cripples the static interface:
a) As a user, you explicitly have to give the type of function and of the
coeffitionts you want to interpolate as template parameters of the
classes. With the static-implementation proposal the compiler can
infere those from the arguments of the interpolate() method. (Of
course, this only adds a few characters to the users code, so this is
not really an important point.)
b) With the virtual-implementation proposal with the static interface, it
is impossible for a user to interpolate a fucntion of one type and a
function of another type later unless they share a common virtual
basis. With the static implementation proposal there is no such
restriction. (With the virtual interface you have that restriction for
both proposals.)
c) Similarly, with the virtual-implementation proposal with the static
interface, it is impossible for a user to interpolate into one type of
coefficients and into another type of coefficiens later, and nobody
wants to derive his coefficient types from a common virtual basis. Of
course, this is probably not done in practice anyway.
In the virtual implementation proposal, the static interface offers not
advantage over the dynamic interface and is thus just dead weight. Only
the static-implementation proposal can offer the benefits of both
interfaces to the user.
Ease of implementation for maintainers and potential implementors:
I feel much more at home with static polymorphism than I do with dynamic
polymorphism.
For the virtual-implementation proposal I see the following advantages:
Ease of implementation for maintainers and potential implementors:
a) Of the current set of people actively involved in Dune, at least Oliver
and Carsten seem to feel more at home with the virtual inheritance
stuff. (Concerning the poeple currently involved, this point probably
boils down to counting how many poeple feel more at home with which
option).
b) People are usually tought the virtual inheritance stuff, so potential
implementors of further local finite elements might feel more at home.
The virtual implementation will probably lead to clearer error messages
from the compiler automatically. For the static version you would have to
use dune_static_assert liberally.
Further remarks:
The static-implementation proposal and the virtual-implementation proposal
can actually be merged. In this joint proposal, some local finite
elements can have a static implementation while others have a virtual
implementation. The wrapper classes can then provide the other
interface.
Benchmarks using Olivers DUNE_VIRTUAL_SHAPEFUNCTIONS define will not
actually tell us very much, since they leave out some levels of
indirection for both proposals. And I have the impression that both sides
assume a lot about which optimizations will actually be done. I'm not
sure that the compiler will always optimize away as much as we assume with
the static implementation for instance. And there is the question of
other compilers than gcc, which nobody has raised yet. Which
optimizations do they perform, and how much do we care?
Implementing mixed finite elements is possible on top of the virtual
interface in both proposals, so this is not really a deciding point.
My current preference is the a joint approach, where whoever implements the
local finite element decides whether to do it statically or dynamically.
Failing that, I would prefer the static implementation.
This preference is not very strong at the moment and some performance numbers
could easily change it, if they take the additional indirection of the two
proposals into account.
Last year I did a few experiments about how expensive virtual function calls really are for shape functions. I'll attach the test program in a second. Here's what I get with
g++ 4.3.3 and -O
Static
1.95612 sec.
With virtual functions, class allocated on the stack
1.94412 sec.
With virtual functions, class allocated on the heap
2.72017 sec.
With virtual functions, class allocated on the heap, access through the base class
2.70417 sec.
--- function call here ---
static, passed by const reference
1.93612 sec.
dynamic, passed by value
1.94412 sec.
dynamic, passed by const reference
2.86018 sec.
dynamic, passed by pointer the the base class
2.84818 sec.
Please see the code for the precise meaning of these numbers. The conclusions I draw from them are the following:
On a very cheap method (evaluate for a q12d element), the overhead is roughly 50%
If you create a class with virtual functions on the stack and use it right away there
is no overhead at all
If you pass a class with virtual functions by value, then there is no overhead either.
You may want to play around with the code a little. Try larger methods, or different compiler flags. In particular, it would be interesting to see if you can pass a class
with virtual functions by reference without overhead if the the method you pass it to gets inlined.
Thanks for the hint. Actually I had a crucial type in my timings: the options I used were -O3 and not -O. Here's what I get with Robert's options:
Static
2.15213 sec.
With virtual functions, class allocated on the stack
2.13613 sec.
With virtual functions, class allocated on the heap
2.80017 sec.
With virtual functions, class allocated on the heap, access through the base class
2.80417 sec.
--- function call here ---
static, passed by const reference
2.13213 sec.
dynamic, passed by value
2.14813 sec.
dynamic, passed by const reference
2.80017 sec.
dynamic, passed by pointer the the base class
2.80417 sec.
Strangely enough this only results in making the static methods a bit slower...
You could also add:
--param max-inline-insns-single=3000 --param large-function-growth=3500 --param inline-unit-growth=3000
some parameters for function inlining. Also, for comparison of run times I suggest that you create some test problems with larger run time. a few seconds is not so meaningful.
We have constructed a test case to measure the speed difference. It assembles the stiffness matrix and a rhs for the Laplace on a Yaspgrid with 1000x1000 elements 10 times. I only measured the pure assembly without building the sparsity pattern with gcc 4.3.3 (-O3 -funroll-loops) on a core2duo T7500.
For this example the average (from 6 runs) runtime without virtual functions was 26.683s with virtual functions you get 28.740s. This is about 7.7% more. Obviously this becomes less if you also count the construction of sparsity, use higher order, ... .
Surprisingly the runtime is about 70s for the non-virtual version with Roberts flags added. The same happens if you newly construct the local FE for every element. For the virtual version this does not influence runtime. We only guess that the reason might be that more inlining increases the code size for the inner loops and thus reduces the cache available for your data.
We have now implemented our version of virtualization into the localfunction module.
As test example we have implemented a pq22d finite element which for us work on a 2d
hybrid ug grid (attached - its twistfree!)
For each static interface there are a corresponding virtual interface and
a wrapper for available static implementations. So one can either directly
derive an implementation from the virtual interface or virtualize a static
implementation - as done in pq22d.hh