There haven't been any real problems with strict aliasing for quite some time, so it should be safe to enable by now.
Description
If compiled with gcc-4.4 and '-O3 -Wstrict-aliasing' the grid checks test-ug and test-alu2dsimplex complain about violations of the strict aliasing rule.
If the grids really violate the strict aliasing rule assumed by '-O3' the following might be related:
I have one application that produces wrong results with '-O3' and gcc-4.4.1. This does not happen with '-O3 -fno-strict-aliasing' or older gcc versions. The code seems correct and any try to debug this makes the bug vanish by avoiding inlining.
Unfortunately the code uses subgrid and our discretization framework which makes it very complicated to extract a test case.
I'm not talking about your example. I know that this mimics the reinterpret_cast construct. The patch removes the reinterpret_cast. I believe that it is not needed any more.
test-yaspgrid compiles with the above patch applied...
With the patch the warning from the EntityPointer disappears for both tests. For test-ug a warning from operator/= and operator= from FieldVector remains.
This seems to be the first place where the data is accessed after the reinterpret_cast in uggrid.
One problem seems to be the reinterpret_cast of the uggrid intersection iterators to the intersection wrapper class due to missing real intersection classes.
I updated the patch to include a version of ugintersectionit.hh which works without reinterpret_cast, but might be bit more expensive. The proper solution will (obviously) be to implement separate intersection and iterator.
Could you try the current patch with your application? Does this also fix your strange heisen-bugs?
If you also create the Intersection in the constructor the warnings are gone. And this also seems to fix my Heisenbug. The extended patch is attached.
Perhaps we should compile all tests with '-O3 -Wall' to trigger all such bugs since you might no notice the 'wrong' inlining else. Notice that '-Wall' alone does not trigger this if '-fstrict-aliasing' is not enabled.
-Wall is supposed to include -Wstrict-aliasing. From the gcc docs and the migration page it seems that this only tries to find problems.
You can also supply a value -Wstrict-aliasing=n. Default is n=3 which is said to have 'fery few false positive and false negativ results'. However n=1,2 should be quicker give more false positives.
I found statements that for some real problems n=2 does complain while n=3 does not. So we could perhaps also compile (not automatically) with n=1,2 and manually check if there are problems. However it's still not clear to me which reinterpret_cast is OK. (Why does the one for the ALUintersections work?)
As I wrote in yesterday's eMail, I think the placement-new is not an option because it calls some constructor. We cannot safely assume that a default constructor does nothing. The copy constructor is not an option either, because the entity pointer could be implemented as a shared_ptr. Copying would lead to a wrong increase in the reference count and thus a memory leak.
(a) stick to the reinterpret_cast (i.e., don't support gcc-4.4 with strict aliasing)
(b) remove the casts
(c) change the hierarchy EntityPointerImpl contained in EntityPointer -> IteratorImpl contained in Iterator
I think (a) is not an option. The impact of (c) is a heavy decrease in the performance of metagrids (they run into the same problem the facades have now). Moreover, the changes to the implementations would be quite strong. So, as far as I see it, (b) is our only real option. This has some consequences, though:
EntityPointer &ep = iterator is no longer possible. I don't see any use for this anyway
const EntityPointer &ep = iterator is still possible, but will create a copy (warning: references to temporaries). Ths could be solved by making the copy constructor from different EntityPointer explicit.
Basically this means that EntityPointers can only be passed by reference using the following construct:
template< class ItImp >
void foo ( const EntityPointer< Grid, ItImp > &ep )
I could live with that.
Still, I have hopes that someone can come up with another idea...
As Christian said we have discussed the options. Regarding Martins options:
(a) is obviously not an option and (b) as well. It will break at lot of code if EntityPointers& ep=it is not supported. And allowing the temporary object needed if iterators are passed as function arguments is neither clean nor intuitive. I don't really understand what you mean with (c).
Our main idea is to derive ItImp from EP instead of EPImp. Then we could savely return the reference to the base class of ItImp.
The clean solution would then by to abolish the strange EP and to make ItImp a member of It. Notice that this will not change the memory layout of any class.
I also have a small patch to EntityPointer with a quick solution for a transition. It returns
(1) EP if Imp=EPImp
(2) EP::realIterator if Imp is derived from EP
(3) reinterpret_cast<EP&>(EP) else
For the current grids this is the same as before: (1) or (3). For all adapted grids with ItImp deriving from EP this selects (1) or (2) and thus is a clean solution. This allowes for a smooth transition iterator per iterator. I have tested this with a quick for the UGLevelIterator.
We also suggests to have additional new interface classes which could be selected (by the GridTraits) for all grids that completely implement the new behaviour.
Since this only adjusts the inheritance hierarchy and does not introduce any additional members, copies or creation of temporaries there is no performance problem with this.
Let me clarify my problem with solution (c). For "normal" dune grids (i.e., dune grids that don't wrap other dune grids), the solution looks perfect. Once you have to wrap another dune-grid, you get the following problem:
implement an entity pointer wrapping the host entity pointer
implement the iterator by deriving from the (wrapped) entity pointer.
The entity pointer already stores the host entity pointer. Now, you have to store the host iterator inside your iterator wrapper. Hence, the iterator wrapper stores the host entity pointer twice: Once in the entity pointer base class and a second time in the host iterator inside the iterator wrapper.
You might get a better understanding of what I'm talking about by looking into the GeometryGrid's entity pointer / iterator.
In my opinion, any solution to this problem should also solve the problem of meta-grids (like GeometryGrid). The reason is that we claim DUNE to be a negligible layer around any grid implementation, unifying the API. Hence, if DUNE is not able to wrap a DUNE grid with negligible effort, my opinion would be that we fail in this (important) design goal.
PS: I would still like to hear a good reason for keeping the cast operators. If they are only needed for backward compatiblity, then I suggest to simply deprecate them and stick to the reinterpret_cast. If someone wants to use gcc-4.4, we can (in my opinion) expect him to adapt the code to non-deprecated methods.
The dilemma is that you can not avoid the cast for the meta grid if you don't want to store a HostIt and a seperate HostEPEntityPointer. There simply must be an EP that does not know about HostIt. Solution (b) would means:
Break compatibility with existing code.
If you want to write generic code you always have to create the temporary EP which means that any grid and not only GeometryGrid gets slower.
One solution could be to derive ItImp from EP and to require a method ItImp::getEntityPointer() that returns the EP&.
Then you could store the HostIt and the HostEP in the base class but only update the HostEP in this method. If you implement all methods directly in It using the HostIt this method is only called if you request an EP&. If this happens your approach (b) would create a temporary and thus be not more efficient.
The only drawback would be that the EP& does not change if you call It++ afterwards. However it would not be very restrictive to define the behaviour in this situation as undefined.
Regarding the cast: The only reason to keep it is to also provide the old grid implementers interface for one release.
Of course you can do this - but this means you have to change your code. It is furthermore no argument against my suggestion since you can then also do
template
foo(It& it)
and would never need to get the EP& and thus to update the second HostEP. BTW: No one wants to make meta grids slow. Is there any performance argument against my suggestion to update the stored HostEP& only if needed ?
I'm still stupified: It must be possible to convert an iterator to an entity pointer. Hence, the entity pointer must contain sufficient information to construct the entity (e.g., the host entity pointer). So how to you want to get rid of this second copy?
I don't suggest to get rid of it. I suggest to only update this information if you need it. If you don't do 'EP& ep = it;' you will only touch the second copy once during creation of the iterator.
construct, you are right, that is a possible solution. But still it will come as a big surprise to the user. We export the type of the EntityPointer in the Grid, but it is useless, as you can not convert LeafIterator to EntityPointer, which is conceptually the base of LeafIterator. Thus I'd prefer a solution which allows the cast.
I still disagree: You can cast the Iterator to an EntityPointer, you just cannot cast the references. I have to agree, though, that the semantics will slightly change this way.
Moreover, I implemented the changes that I think necessary (see attached patch). The tests in dune-grid and dune-fem seem to run fine with it. I also tried to run make check in dune-pdelab, and it seemed to work fine - but there was some error in Newton-stuff that did not seem to result from this change. So, I repeat my question once more: Can you tell me any piece of code that really needs these casts?
Please note: The patch overdoes it a little and marks the copy constructor explicit. This way, I wanted to find all the methods taking a const EntityPointer &. This is, of course, not strictly needed.
In the interest of a quick solution, I would now ask you to implement your suggestion (also fixing the grids, as necessary). We can then vote on which patch should be applied to the repository.
@Martin: If a user tried to avoid the construction of a separate EntityPointer which is so expensive (following you comment) and used a reference he will have to change this.
In my opionion a solution that avoids such a fundamental change in the interface would be preferable. So I ask again if there are any performance issues with my suggestion.
BTW:Obviously the tests will run fine if you comment things out that don't run.
@Carsten: If the grid tests are the only place where this is needed, then this is not a good argument for implementing it. The tests should contain common features and test them.
@Carsten: Yes, I think they might be. But this does not prove it, nor can any implementation of mine (because I seemingly don't see the right way of doing it). Hence I repeat my request: Implement it (and adapt at least GeoGrid)! Then we can compare both approaches and decide properly. As far as I understand, you want to do this anyway, so there is no extra work, here.
@All: I think we should try to resolve this issue before the end of the week. The decisions in Berlin consider this a major show-stopper for the next release.
I do not think that we should resolve the issue before the end of the week, because:
a) I do not think it is a major show-stopper: you can always compile with -fno-strict-aliasing
b) From reading the discussion it seems that nobody is really sure about the consequences of any of the proposes changes. I would want more facts before voting on this.
We should release before any other stric-aliasing interface changes and tell people about the problem and at the same time propose compilation with -fno-strict-aliasing.
I added some really nasty casting and assignment stuff to the grid check (see attached patch). The second one even fails for YaspGrid. So, even in case we keep the old semantics, they should be clarified. The status quo seems that the non-const cast is undefined right now.
Nice test. Why did no one realize this gap yet? ;-)
We should really only allow 'const EP& ep=it;'. Even if we decide that It derives from EP the base class will in general not contain enough information to allow further incrementing of It - especially in the case of a meta grid.
That is true. I assume nobody realized the gap, because hardly anybody is doing the "stupid" cast and even fewer people will actually change the EP afterwards.
PS: It was not my intention to insult anybody. Indeed, I assumed that nobody would actually write such code, as it is total misuse of the interface. I just wanted to push the interface to its limit.
I didn't take it as an offense and actually I'm quite glad about your test.
I agree with Carsten, we should disable the non-const cast without warning, because might lead to even worse bugs and it should be fairly easy to switch to "const EP&".
The other question is about what to do with the const-cast. Leave or drop it, and how to fix it.
Dropping it implies that people create a temporary if the try to cast it (because it is perfectly possible). And this is a situation I wouldn't like either.
If you take a look at the patch, you will see that I made the copy constructor explicit, which disables the copying. The only way to pass the reference is the deprecated cast operator. I did this, because I wouldn't like implicit copying either.
If we're not out for performance, we can decide to allow for implicit copying and simply remove the cast operator. With respect to code correctness, this would keep the status quo. Moreover, it would well suit the semantics of a pointer.
Unfortunately, copying entity pointers seems to be quite expensive. That is why I think that implicit casting (through the copy constructor) should be forbidden and marked this constructor explicit. As I marked the const-cast deprecated, the user can decide what to do about the implicit casts. He can either call the copy constructor explicitly, or he can find another way to avoid the cast (see above).
Now, as detailed above, there is another solution to this problem, which needs a lot more work on the grid implementation part. We have to ask ourselves, if the trouble is worth it. Moreover, there is the pending question, if this would introduce performance impacts. While I personally think so, other seem to disagree.
For me, this boils down to the question: Could somebody implement this solution, so that we can judge the impact?
I would appreciate it, if we could solve this issue as soon as possible.
I currently only have a hack for the UGLevelIterator. However the questionable part seems to be GeoGrid where one has to switch from EPImp to a single EPImp as base class and implement the 'update base class if needed' idea.
Unfortunately I don't know the GeoGrid internals and currently do not have the time to implement this.
If one can expect a big performance impact it's not worth trying it. However I can't see the performance issue with that solution. Could you explain which problem you expect Martin ?
If we remove the cast completely having an explicit constructor is a must. Otherwise there is not only the slow creation of a temporary but you could also still do 'EP& ep=it;' which WAS OK but will lead to undefined behaviour (segfault if you're lucky).
Are you sure that 'EP &ep = it' if you just have a non-explicit copy constructor? I wrote a small tes program (attached) for this issue. Did I understand you correctly? If not, can you adaptt the piece of code?
Here is one possible solution for yaspgrid and uggrid. (See attachement)
It implements Carstens proposal for yaspgrid and uggrid. I also cleanup the code a bit to keep the changes to the existing code as small as possible.
I have a different idea that would allow fast code also for wrapped grids, but it would require some major changes to the interface. By changing the grids to use consequently Traits, we could change the class hierarchy of an existing grid:
The normal grids woul have something like this:
EPImp <>- EP <- LevelItImp <>- LevelIt
And in the wrapped case it would change to
EPImp <>- EP <- EPImpWrap <>- EP <- LevelItImp <>- LevelIt <- LevelItImpWrap <>- LevelIt
We see the following two problems:
a) It seems that all EPImps need a fixed constructor so that LevelItImp can create EPImpWrap.
b) Will it be possible (easy) to put LevelItImp into the library, if it has an additional template argument (that may not be specialized).
A) What about the patch? Do you consider this a feasible solution?
B) I don't quite understand your concern. EPImpWrap will initialize it's base-class EP from a given EPImp. This might come from an iterator, or what every function.
C) I don't understand you point (b). Not even the sentence. Please, could you be a bit more verbose?
A) it sort of depends on the solution to our questions - it does seem a lot of work for
the grid implementers - let us see
B) The question was: in the wrapper LevelItImp is derived from EP so it has
to construct it - which seems to indicate that LevelItImp has to be able to construct
an EPImpWrap. In the case of geogrid this requires the EPImp (no problem) and the
coordinate function (big problem)?
C) you suggested to move as much as possible into the dune-grid lib a few days ago. If
LevelItImp has an additional template argument EP where Imp is not specifiable (e.g.
Imp=EPImp or EPImpWrap) how can it be moved to the lib
Although some suggested solutions introduce a semantic change to iterators/enitypointers that would be better placed in 2.0 instead of 2.x I suggest to postpone this since there is no easy solution.
IMHO the best solution would be to
clearly state 'must be compiled with -fno-strict-aliasing'
note that 4.4 will produce wrong code without it but pre 4.4 gcc seemed to work fine
automatically add -fno-strict-aliasing for gcc-4.x x>=4
put this all this information also in a warning if compiled without -fno-strict-aliasing
I think we should remove the non-const cast operator before 2.0! If anybody really needs it, he must have misunderstood the semantics (see naughty-grid-check above).
The fact that even the official documentation (and thus at least three core developers) misunderstood the semantics is a clear indicator that this is neither clear nor it should be a last minute decision.
"Even more, you can cast an Iterator refence to a reference pointing to Dune::EntityPointer."
I was not talking about the const-cast, but about the non-const one. This cast allows for very stupid mistakes. It allows, for example, the following stupid code:
As far as I understood this stuff, all developers discouraged this possibility. That's why I brougt the matter up. Personally, I'm fine with any decision (I always have the above patch applied anyway)...
I applied most of the above entity pointer patch to the repository (some time ago). Just the cast operator is not deprecated. So, unless you use that case operator, strict aliasing should be obeyed by now. Just one strange side note:
Iterator it = grid.leafbegin();
EntityPointer ep1 = it; // cast operator used
EntityPointer ep2( it ); // cast operator not used
Iterator it = grid.leafbegin();
EntityPointer ep1 = it; // cast operator used
This is a use of the copy constructor EntityPointer(const EntityPointer&).
Since Iterator is not an EntityPointer, it has to be cast to one first.
EntityPointer ep2( it ); // cast operator not used
This is a call to the EntityPointer constructor with one of the signatures
EntityPointer(Iterator&)
EntityPointer(Iterator)
EntityPointer(const Iterator&)
EntityPointer(const Iterator)
Construction via =... and via (...) are two different things.
Unfortunately we did not dicuss this issue and the suggestion on droping the EntityPointer in favor of a storable Entiy. In view of simplicity I support Martins proposal to remove the cast.
IMHO it would then be a good idea to also remove the EP base class from It. Perhaps this could be done during the transition to the new EntityInterator. However you would also have to change
template void f(const EntityPointer& ep){}
to
template void f(const It& ep){}
On the other hand the first version uses interface types that are not provided by the interface.
Since Casten also suggested deprecating the cast operator, I just did so. If anybody objects, we should very soon discuss and vote on changes for the 2.1 release.
@Carsten: I do not see any need to remove EntityPointer as a base class for EntityIterator. The reasons are as follows:
(a) An iterator still fully implements the interface of an entity pointer, i.e., we would have to duplicate it
(b) Iterators don't cast into EntityPointer anymore
(c) We could not deprecate the cast but would have to remove it
(d) There is the pending discussion of removing the entity pointer alltogether, so it would only be an intermediate change
(e) It might mean changing a lot of code
You're right Martin - we should not remove this until we finally decided if we remove the EP.
If we don't change it's IMHO worth the interface duplication since we get rid of this ugly construction: It derives from EP contains ItImp. Furthermore deriving the It interface from the EP interface gives the impression that It's are EP's which is not true.
From my point of view, Iterators are a model of an entity pointer, i.e., they satisfy the same interface. The key difference from the previous presentation is, that an iterator and the normal entitypointer are two complete different objects (like std::vector and std::map, which are both containers).
The implication is: Whereever you need an entity pointer, you can also use an iterator. But you may not expect an object of type EntityPointer (unlike in dynamic polymorphism).
The new source of confusion might be naming: There is a concept EntityPointer and each grid also provides a type EntityPointer. These are two entirely different things. Each iterator remains a model of the EntityPointer concept, but it no longer associated with the type EntityPointer (as exported by the grid). In view of the upcoming release, we should maybe update the documentation to state this ambiguity explicitly.
For me, the only strict aliasing warning I experienced was caused by the reinterpret_cast in dune/grid/common/entitypopinter.hh. This is now deprecated and no longer used in any of my code.
There still are some potential problem in the generic base function sets in dune-localfunctions. We'll have to take another look, there.
There's still several places where a reinterpret_cast is used. Since you can only use this in rare cases without violating the rule, we should go through all of the uses and check if they are standard conforming. If they are not we should replace them if they are we should add a short documentation why it's OK here for later reference.
For example there seem to be several places where some raw C array is casted to a FieldVector which is in my impression a violation (even if there's no warning with current compilers).
I don't think we can fix this in 2.2, at least not in a complete fashion. I seems to work, but this might be a compiler thing... I would mark this as a 3.0 milestone
This will definitely not be solved for 2.2. Since we still violate the standard we should clearly state this in the release notes and suggest the '-fno-strict-aliasing' flag. Having no compiler warning and no obvious errors does not mean that there's no problem.
Perhaps we should discuss on the next meeting if we really want be standard conforming here.
Hmm, I lost track of what this bug is about precisely. There are no warnings, and nobody reports crashes, still Carsten says we have to keep this issue open. What do we need to do to be able to close it?
We have to go through the standard and the code and look for all cases violating the strict aliasing rules. Not all of them are reported by the compiler. E.g. as fas as I understand the standard it is not allowed to cast double* to FieldVector even if the binary layout is compatible. But we have several places we use such constructs.
@Oliver: That's why I suggest to decide on the meeting.
Finding those could be done by grepping for reinterpret_cast. Almost all uses seem to violate the rule.
But if we decide to accept the violation we should reflect this in the compiler flags - even if currently no one observes a problem.
I finally removed the deprecated cast to the pseudo base class. I've discussed with Oliver that we'd like to backport this to the already created 2.2 release branch.
With the deprecated cast removed the strict aliasing warnings are gone. However we still violate the rule. Hence I propose to keep this task open until we fixed this or decided not to do so.
We just decided (on the dev meeting 2012) that we want to respect the strict aliasing rule. I'll prepare a list of all the places with possible problems in the core modules and place it here.
What is the status here? I have been running the whole PDELab stack with current versions of GCC and Clang since last year and haven't had any aliasing-related warnings or problems.
Can we close the issue? And maybe add to the 2.3 release notes that -fstrict-aliasing should work now?
when I do a complete build (from scratch, fresh sources) of common, geometry, grid, localfunctions, istl, typetree, pdelab, and pdelab-howto using GCC 4.4 with -fstrict-aliasing (in release mode)
Looking at the places that got flagged there, they look very much like false positive (sometimes there's no cast at all, and sometimes it's just a static_cast to a derived class, which is allowed).
Can someone else trigger this problem with GCC 4.4? My GCC 4.4.7 (Macports) stays shtum. It should be sufficient to compile basearraytest in dune-istl/dune/istl/test, that instantiates the code triggering the warning at line 431. To make sure you get the warning, compile with "-O3 -fstrict-aliasing -Wstrict-aliasing=3".
That said, the optimizer in GCC 4.4 has a rather spotty track record, so it might be better to settle on a compromise and tell people to only enable strict aliasing on GCC 4.5 and higher, anyway (or 4.6 - that release was really solid IMHO).
Can we just close this as 'worksforme'? Because nobodoy seems to actually have any problems, besides the warnings. If people come and report random crashes we can still reopen the case.
I was under the wrong impression that we told people not to use strict aliasing in the release notes, but that's not true…
So I agree, let's just close the issue (this should have more than enough testing, as recent versions of GCC enable -fstrict-aliasing by default at -O2 and higher).