#564 Use dynamic polymorphism

Here is the discussion from the list:

Hi Peter! I still think that virtual methods are much easier to use than template polymorphism. I'll try to do some testing to know more about the impact on efficiency. (You may want to read the article on devirtualization in the 2006 gcc summit proceedings). The final decision should be made by the entire team at our next meeting.

Greetings, Oliver

Peter Bastian schrieb:

Hi Oliver,

there is one problem with the virtual functions. The idea with the recursive definition of CkLocalFunctionInterface we had at the meeting did not work out. Currently the method evaluate is a template method and thus cannot be virtual. Of course for testing it is sufficent to make evaluateFunction in C0LocalFunctionInterface virtual. If virtual functions are desired we have to think once more.

Regards,

Peter

Am Dienstag, den 04.11.2008, 11:25 +0100 schrieb Oliver Sander:

Hi Peter! Thanks a lot. As agreed upon at our last meeting I will migrate the tests from dune-disc. I will also test an interface using virtual functions and try to estimate its impact. This will give us some facts when the final interface will be discussed.

-- Oliver

Peter Bastian schrieb:

Hi developers,

I prepared the new module dune-localfunctions as discussed ot the meeting on 06 October (up to some details that gave problems). I added the notes to the file dune_treffen_061008.txt.

You can now have a look at it and we can start discussion about it.

I still have to adapt my code to these new interfaces, so not everything is checked so far.

Best,

Peter

If dune-localfunctions is supposed to be THE official shapefunction module it should support mixed elements as dune-grid (especially UG) supports them. This could be added in two ways:

Add a FE class that wraps a statically fixed set of FiniteElements for different geometries and decides which should be used (internal) dynamically on every function call.
Use dynamic polymorphism with a virtual base class.

In my opinion 2) is a much cleaner solution. And if the type is known at compile time this should be optimized away anyway.

Hi Carsten,

"should" is always a weak argument. Option (1) is the solution proposed by Peter. Oliver offered to provide some time measurements concerning virtual functions.

I think it necessary to bring some facts into the discussion before we continue.

Christian

Supporting mixed elements is a must. The weakness of "should" only applies to the optimization. However saying virtual functions are slower is also a weak argument since it is simply not true in general.

Another solution would be to use virtual functions and to provide a template that wraps them. Then you could use either the virtual base class or the wrapped derived class in the application and benefit from either flexibility or the lack of virtual methods.

Hi all,

I just like to repeat my argument: I do not see why the use of virtual functions needs to established in the interface. My idea was this: If your implementation is simple, e.g. one element type, one polynomial order, then there are no virtual functions. If your implementation is complicated, make a base class with virtual functions, say P1MultiElementTypeBase and derive from that your implementations for triangles and quads.

You propose to do it just the other way around: Hardwire the virtual functions in the interface and rely on some other quirk that removes them if you don't want them. I just don't see the point what is so bad with my solution (besides personal taste).

Besides all that I do not think that run-time is such a big issue there. It would be nice to see some numbers nevertheless. If they indicate that there is no difference, then let us use virtual functions. Period.

Writing an additional base class P1MultiElementTypeBase with virtual functions would require to explicitly clone or wrap each FE you want to use. Then one would in fact have two parallel class hierarchies one without and one with virtual functions.

On the other hand a static wrapper to hard wire the derived type (if you know that only one appears) can be done generically without introducing a parallel class hierarchy.

I added the switch --enable-virtualshapefunctions to dune-localfunctions to allow easy switching between a dynamic and a static hierarchy. My reason for this switch was not to give people the option forever but to allow easy run-time comparisons. I have not actually made such comparisons yet, though.

I did do some general comparisons on the efficiency loss due to virtual functions for our last meeting, and I remember the results being acceptable. I have to haunt the archives though before I can post them.

I'm not sure, whether everyone understands each others proposals correctly (I certainly don't). In the following I'm summarizing what I undestand. Please edit this comment if I misrepresent your proposal so everyone knows what we're talking about.

Static-implementation proposal

First there is the proposal from Peter (a.k.a. "1)") which I understand as follows:

Have a set of static base classes (StaticLocalFiniteElementInterface, StaticLocalBasisInterface, StaticLocalCoefficientsInterface, and StaticLocalInterpolationInterface). These base classes get the following template parameters, as apropriate:
- The C0LocalBasisTraits, C1LocalBasisTraits or CkLocalBasisTraits as apropriate.
The interpolate() method (and the methods for my experimental global interface) are template methods.
Have a library of static local finite elements derived from that (as it is now), for the people who would like the static version.
Have a set of virtual base classes (VirtualLocalFiniteElementInterface, VirtualLocalBasisInterface, VirtualLocalCoefficientsInterface, and VirtualLocalInterpolationInterface). These base classes must have the following template parameters, as apropriate:
- The C0LocalBasisTraits, C1LocalBasisTraits or CkLocalBasisTraits as apropriate,
- The function type F which may be interpolated, and
- The type C used for the coefficients.
- (In addition, for the global interface I'm currently experimenting with, the type of the geometry would also be required)
This means the the template methods which are currently in place would move their template parameters to the class in the virtual interface. Thus you need to specify the kind of function and the coefficient type earlier, but that is just the price you have to pay with virtual inheritance anyway. (Of course you can use virtual base classes for the template parameters, so that is not really an issue).
Have a set of derived wrapper templates VirtualLocalFiniteElement, VirtualLocalBase, VirtualLocalCoefficients, and VirtualLocalInterpolation. Each of those classes can be constructed with an instance of the underlying class.

That makes a total of 8 template classes you have to write so poeple can use the virtual interface on top of the static interface.

Virtual-implementation proposal

Next we have Carstens proposal which I understand as follows:

A set of virtual base class templates VirtualLocalFiniteElementInterface, VirtualLocalBasisInterface, VirtualLocalCoefficientsInterface, and VirtualLocalInterpoaltionInterface. These base class must have the following template parameters, as apropriate (same as the template parameters from Peters proposal above):
- The C0LocalBasisTraits, C1LocalBasisTraits or CkLocalBasisTraits as apropriate,
- The function type F which may be interpolated, and
- The type C used for the coefficients.
- (In addition, for the global interface I'm currently experimenting with, the type of the geometry would also be required)
Every implementation of the local finite elements provides a set of classes derived from those classes (VirtualLocalFiniteElement, VirtualLocalBasis, VirtualLocalCoefficients, and VirtualLocalInterpoaltion).
Have a set of static base classes (StaticLocalFiniteElementInterface, StaticLocalBasisInterface, StaticLocalCoefficientsInterface, and StaticLocalInterpolationInterface). These base classes get the following template parameters, as apropriate:
- The C0LocalBasisTraits, C1LocalBasisTraits or CkLocalBasisTraits as apropriate,
- The function type F which may be interpolated, and
- The type C used for the coefficients.
- (In addition, for the global interface I'm currently experimenting with, the type of the geometry would also be required)
The interpolate() method (and the methods for my experimental global interface) are non-template methods, since they cannot be template methods in the underlying virtual classes anyway.
A set of static wrapper classes (StaticLocalFiniteElement, StaticLocalBasis, StaticLocalCoefficients, and StaticLocalInterpolation) which are derived from the static interface classes above. Each wrapper class gets an instance of its virtual pendant on construction.

That makes a total of 8 template classes you have to write so poeple can use the static interface on top of the virtual interface.

Select-at-configure-time proposal

There also seems to be the possibility to switch between virtual and static inheritance at compile time (like Oliver implemented it) but that was just for perfornance testing and nobody seriously thinks about that.

This reflects my proposal. 'Til now I missed the fact that the static->dynamic wrapper (point 3.) can also be done generically. Taking this into acount both approaches are almost equivalent with the following exceptions:

A dynamic->static wrapper can not produce the interpolate() template if it's not there additionally to the virtual method (1:0 for static approach)
(Non-wrapped) Dynamic polymorphism lead to much more readable/understandable code and error messages. (1:1)
The virtual interpolate() does not cost to much (???)
Virtual functions get optimized away (???)

For me even 2) alone is a very good argument and 2)+3) are enough to switch to dynamic polymorphism.

For the static-implementation proposal I see the following advantages:

The virtual-implementation proposal cripples the static interface:

a) As a user, you explicitly have to give the type of function and of the coeffitionts you want to interpolate as template parameters of the classes. With the static-implementation proposal the compiler can infere those from the arguments of the interpolate() method. (Of course, this only adds a few characters to the users code, so this is not really an important point.)

b) With the virtual-implementation proposal with the static interface, it is impossible for a user to interpolate a fucntion of one type and a function of another type later unless they share a common virtual basis. With the static implementation proposal there is no such restriction. (With the virtual interface you have that restriction for both proposals.)

c) Similarly, with the virtual-implementation proposal with the static interface, it is impossible for a user to interpolate into one type of coefficients and into another type of coefficiens later, and nobody wants to derive his coefficient types from a common virtual basis. Of course, this is probably not done in practice anyway.

In the virtual implementation proposal, the static interface offers not advantage over the dynamic interface and is thus just dead weight. Only the static-implementation proposal can offer the benefits of both interfaces to the user.
Ease of implementation for maintainers and potential implementors:

I feel much more at home with static polymorphism than I do with dynamic polymorphism.

For the virtual-implementation proposal I see the following advantages:

Ease of implementation for maintainers and potential implementors:

a) Of the current set of people actively involved in Dune, at least Oliver and Carsten seem to feel more at home with the virtual inheritance stuff. (Concerning the poeple currently involved, this point probably boils down to counting how many poeple feel more at home with which option).

b) People are usually tought the virtual inheritance stuff, so potential implementors of further local finite elements might feel more at home.
The virtual implementation will probably lead to clearer error messages from the compiler automatically. For the static version you would have to use dune_static_assert liberally.

Further remarks:

The static-implementation proposal and the virtual-implementation proposal can actually be merged. In this joint proposal, some local finite elements can have a static implementation while others have a virtual implementation. The wrapper classes can then provide the other interface.
Benchmarks using Olivers DUNE_VIRTUAL_SHAPEFUNCTIONS define will not actually tell us very much, since they leave out some levels of indirection for both proposals. And I have the impression that both sides assume a lot about which optimizations will actually be done. I'm not sure that the compiler will always optimize away as much as we assume with the static implementation for instance. And there is the question of other compilers than gcc, which nobody has raised yet. Which optimizations do they perform, and how much do we care?
Implementing mixed finite elements is possible on top of the virtual interface in both proposals, so this is not really a deciding point.

My current preference is the a joint approach, where whoever implements the local finite element decides whether to do it statically or dynamically. Failing that, I would prefer the static implementation.

This preference is not very strong at the moment and some performance numbers could easily change it, if they take the additional indirection of the two proposals into account.

Last year I did a few experiments about how expensive virtual function calls really are for shape functions. I'll attach the test program in a second. Here's what I get with g++ 4.3.3 and -O

Static 1.95612 sec. With virtual functions, class allocated on the stack 1.94412 sec. With virtual functions, class allocated on the heap 2.72017 sec. With virtual functions, class allocated on the heap, access through the base class 2.70417 sec. --- function call here --- static, passed by const reference 1.93612 sec. dynamic, passed by value 1.94412 sec. dynamic, passed by const reference 2.86018 sec. dynamic, passed by pointer the the base class 2.84818 sec.

Please see the code for the precise meaning of these numbers. The conclusions I draw from them are the following:

On a very cheap method (evaluate for a q12d element), the overhead is roughly 50%
If you create a class with virtual functions on the stack and use it right away there is no overhead at all
If you pass a class with virtual functions by value, then there is no overhead either.

You may want to play around with the code a little. Try larger methods, or different compiler flags. In particular, it would be interesting to see if you can pass a class with virtual functions by reference without overhead if the the method you pass it to gets inlined.

The test programs

Attachments

Hello Oli,

I strongly recommend to rerun your tests with the following compiler options:

-Wall -O3 -DNDEBUG -funroll-loops -finline-functions -ffast-math -fomit-frame-pointer -msse3 -mfpmath=sse

Best regards

R

Thanks for the hint. Actually I had a crucial type in my timings: the options I used were -O3 and not -O. Here's what I get with Robert's options:

Static 2.15213 sec. With virtual functions, class allocated on the stack 2.13613 sec. With virtual functions, class allocated on the heap 2.80017 sec. With virtual functions, class allocated on the heap, access through the base class 2.80417 sec. --- function call here --- static, passed by const reference 2.13213 sec. dynamic, passed by value 2.14813 sec. dynamic, passed by const reference 2.80017 sec. dynamic, passed by pointer the the base class 2.80417 sec.

Strangely enough this only results in making the static methods a bit slower...

You could also add: --param max-inline-insns-single=3000 --param large-function-growth=3500 --param inline-unit-growth=3000

some parameters for function inlining. Also, for comparison of run times I suggest that you create some test problems with larger run time. a few seconds is not so meaningful.

Regards

R

I also want to thank Jö for his very detailed description of the situation. I support his view on the problem.

We have constructed a test case to measure the speed difference. It assembles the stiffness matrix and a rhs for the Laplace on a Yaspgrid with 1000x1000 elements 10 times. I only measured the pure assembly without building the sparsity pattern with gcc 4.3.3 (-O3 -funroll-loops) on a core2duo T7500.

For this example the average (from 6 runs) runtime without virtual functions was 26.683s with virtual functions you get 28.740s. This is about 7.7% more. Obviously this becomes less if you also count the construction of sparsity, use higher order, ... .

Surprisingly the runtime is about 70s for the non-virtual version with Roberts flags added. The same happens if you newly construct the local FE for every element. For the virtual version this does not influence runtime. We only guess that the reason might be that more inlining increases the code size for the inner loops and thus reduces the cache available for your data.

Attachments

virtual_sf_speed_test.cc

We have now implemented our version of virtualization into the localfunction module. As test example we have implemented a pq22d finite element which for us work on a 2d hybrid ug grid (attached - its twistfree!)

For each static interface there are a corresponding virtual interface and a wrapper for available static implementations. So one can either directly derive an implementation from the virtual interface or virtualize a static implementation - as done in pq22d.hh

Attachments

hybrid2_0.dgf

Status changed to closed

Property	Value
Reported by	Carsten Gräser (graeser@math.fu-berlin.de)
Reported at	Jun 18, 2009 21:39
Type	Feature Request
Version	Git (pre2.4) [autotools]
Operating System	Unspecified / All
Last edited by	Oliver Sander (oliver.sander@tu-dresden.de)
Last edited at	Nov 24, 2009 17:45
Closed by	Oliver Sander (oliver.sander@tu-dresden.de)
Closed at	Nov 24, 2009 17:45
Closed in version	Unknown
Resolution	Fixed
Comment	On the meeting it was decided to have mixture of static and virtual wrapper

#564 Use dynamic polymorphism

Metadata

Description

Designs

Child items ...

Activity

Static-implementation proposal

Virtual-implementation proposal

Select-at-configure-time proposal

Attachments

Attachments

Attachments