Skip to content
Snippets Groups Projects

Feature/float128

Merged Simon Praetorius requested to merge simon.praetorius/dune-common:feature/float128 into master
All threads resolved!

Summary

Add support for __float128 floating point type to dune. Since the gnu quad-precision library provides only a C interface and only very limited support in C++ standard library, some cmath and type_traits utilities are added to allow for a usage of quad precision types in FieldVector and FieldMatrix. Since a discussion in !515 (closed) a wrapper Dune::Float128 around the intrinsic type is provided to have better control about overload resolution.

Motivation

Quad precision floating point types are currently not supported natively by all compilers or even in hardware, but some compilers provide a library implementation that might be sufficient already. The GNU multiprecision library adds this feature and is supported in GCC and clang >= 3.9. Also in intel icc compiler a quad precision type can be enabled.

Intent to merge: 2018-05-25 2018-06-25

Edited by Jö Fahlke

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
299 } // namespace Dune
300
301 namespace std
302 {
303 #ifndef NO_STD_NUMERIC_LIMITS_SPECIALIZATION
304 template <>
305 class numeric_limits<Dune::Impl::Float128>
306 {
307 using Float128 = Dune::Impl::Float128;
308
309 public:
310 static const bool is_specialized = true;
311 static Float128 min() { return FLT128_MIN; }
312 static Float128 max() { return FLT128_MAX; }
313 static Float128 lowest() { return -FLT128_MAX; }
314 static const int digits = FLT128_MANT_DIG;
  • I think that the functions are declared noexcept since C++11: http://en.cppreference.com/w/cpp/header/limits

  • Yes, thanks. I have added also constexpr qualifier wherever possible.

  • Is there any info on whether __float128 is, in fact, usable in a constant expression? Problem is: we're only allowed to mark a function constexpr if there is at least one possible combination of (template-)arguments where the function evaluation is a constant expression. And if __float128 is not usable in a constant expression, I don't think there is any function here (or in Float128) that can be marked constexpr.

  • I'd say yes: http://coliru.stacked-crooked.com/a/466039c24058c3b1 But I don't have any harder evidence.

  • I looked in the standard and could not find anything either. The standard mentions extended integer types (which would qualify for use in constant expressions), but it does not mention extended floating-point types.

    Boost has a wrapper which they say is constexpr, no idea how well they thought that through though.

  • Looking into my /usr/lib/gcc/x86_64-linux-gnu/6/include/quadmath.h (gcc 6.3), all the math functions (e.g. acosq()) are extern, which sort-of implies that they can't possibly be constexpr. Thus, anything exclusively using them can't be constexpr, either.

  • Of course, you could work around all this: Make the wrapper a template class, and templatize it with the wrapped type. Then do a using Float128 = Float128Impl::Wrapper<__float128>; Make sure all the freestanding math functions are templates too. Then you can argue: well, I could instantiate the template with some literal type LT in which case the wrapper would really be a literal type. And for the math functions you can argue: well, I could provide an overload constexpr LT acosq(LT);, thus I can mark the template<class T> Wrapper<T> acos(Wrapper<T>); function template as constexpr. Then whether those things really produce constant expressions depends on whether the compiler considers the underlying expressions as constant. So you could get different results on different compilers.

  • In the end I don't think I care anymore. I myself would just throw the constexpr stuff out where in doubt. But If you want to keep it I'm fine with merging it and seeing whether it causes any problems -- I don't think there is any other viable way to find out whether it is problematic.

  • I have marked only functions constexpr that depend on __float128 or Float128 only and do not call any C functions. So, I think, the class Float128 is a LiteralType and can thus be used as argument in constexpr functions. Maybe it can even be simplified to an aggregate type by removing all the constructors.

  • Please register or sign in to reply
  • added 1 commit

    • c8d57a2b - test of several cmath function for Float128 added

    Compare with previous version

  • I just tried it with a Dumux test that uses the full core modules stack, i.e. also ISTL BlockVectors and BCRSMatrices with Dune::Float128 as a field type. It seems to work nicely, great!

  • Another Dumux test fails for expressions like

    using Scalar = Dune::Float128;
    ...
    Scalar a;
    using std::max;
    max(a, 0.0);

    because 0.0 is interpreted as double and the std::max function isn't triggered. What should we do about this? Enforcing the user to write

    max(a, Scalar(0.0));

    every time or overloading

    Dune::Float128 max(Dune::Float128 a, double b);

    The latter seems more attractive to me.

  • Which fails the fmatrixtest. I'll check...

  • That max and min fail for different floating point types is the correct behavior of std::max and std::min, respectively, since it is defined for exactly the same type. Otherwise, it can not be implemented with return-type a references, since a case into a common_type must be performend in ?: operator.

  • Using double explicitly in the overloads instead of a template parameter for the other argument makes the tests pass again. I updated the MR correspondingly. Do you consider this legitimate or would you rather enforce everyone to write something like Scalar(0.0) everywhere?

  • The question is: Why allow double there explicitly? Why not float or int? We could got for ArithmeticType + Float128 combinations. This, at least, is allowed for pow in the standard.

  • min() and max() don't work for mixed types, not even float mixed with double. Don't bother trying to implement it. I'm afraid users will have to use Scalar(0) as an argument.

  • Mixed type min/max is possible to implement, but then it is different from std::min/std::max, and in several places in Dune std::min/std::max is called explicitly. So, does it make sense to provide overloads for mixed type in Dune::Impl namespace to be found only if std:: is not added to function call?

    For pow it is different, since there, the general signature pow(Arithmetic1,Arithmetic2) is allow for std::pow, since could be implemented also for Float128 and by default it casts the arguments to _float128.

  • pow() is another story. That is useful to have with mixed data types.

    The difference is: math functions like pow() are specified to have overloads for all kinds of combinations and automatically promote their arguments, see here, case (7). min() and max() on the other hand take their arguments by reference and do not promote them. The "taking by reference" part has created all kinds of problems. Yes, we do overload them for some of the simd types, because for those the result can be a mix of both arguments, and thus the default implementation supplied by the standard library is insufficient (and would not even compile). But I'll readily admit that providing overloads for them is questionable

  • As a workaround for the min/max problem: We introduce the cmath functions fmin,fmax and these allow to have mixed types. Arguments are passed by value and return type is the promoted type. So, if you call std::min/std::max you are required to have the same types but in case you do not bother, call fmin/max

  • Actually, all binary cmath functions f allow since c++11 to be implemented as f(Arithmetic1, Arithmetic2)

  • The fmin, fmax isn't very attractive to me, since I'd then have to change max...0... to fmax...0... instead of max...Scalar(0).... But maybe we want to have it for the sake of completeness.

  • I updated the merge request in simon.praetorius, throwing out min/max and generalizing pow to hopefully the right thing.

  • Thanks. Looks better. I would go for the variant to allow Arithmetic types in both arguments. Since this holds for many functions, I have added a macro similar to the existing one for comparison operators. There, I already allow mixed type comparison.

  • added 1 commit

    • 944cbe70 - added mixed type arguments for binary cmath functions

    Compare with previous version

  • I also removed the Intel _Quad type, since I can not find any reference, any documentation about which functions are implemented and not even the compiler flag to enable it. Intel with libstdc++ supports GNU quad-precision type __float128 an thus we are fine with just this type.

  • because 0.0 is interpreted as double and the std::max function isn't triggered. What should we do about this? Enforcing the user to write

    max(a, Scalar(0.0));

    This explicit instatiation is necessary in many cases. It is the recommended way to write generic operators. If you look into ISTL this is one the changes we had to introduce to make other field types work.

    I think this should be OK, as the user seldom writes a new operator (or local operator) and if someone later wants to use it with something like Dune::Float128 it is very straight forword to fix such errors.

    I'm not in favor of additional overloads, as it is not clear which ones you will need. ALso many users will write std::max(a,0.0), so that you will have to change the code anyway.

  • There is also a bug in float_cmp.cc with exactly this problem. I will fix this in a separate MR, since it is not restricted to Float128 type.

  • added 1 commit

    • 0bdd80e4 - cleanup of Float128 class and removed int-assignment test in fvectortest

    Compare with previous version

  • Simon Praetorius resolved all discussions

    resolved all discussions

  • Jö Fahlke
  • Jö Fahlke
  • Simon Praetorius resolved all discussions

    resolved all discussions

  • Jö Fahlke
  • added 1 commit

    • 682a0b11 - marked quadmath functions inline and noexcept

    Compare with previous version

  • Looks fine to me now. Remove the WIP marker when you think you are done, so we can give it a week for comments.

  • Simon Praetorius resolved all discussions

    resolved all discussions

  • Simon Praetorius unmarked as a Work In Progress

    unmarked as a Work In Progress

  • Simon Praetorius changed the description

    changed the description

  • @core Any objections? Otherwise I intent to merge 2018-05-25 2018-06-25.

    (Edit: fixed mistyped date, thanks @markus.blatt)

    Edited by Jö Fahlke
  • Jö Fahlke changed the description

    changed the description

  • assigned to @joe

  • @joe may borrow your time machine sometimes? Might come in handy.

  • Jö Fahlke changed the description

    changed the description

  • merged

  • Jö Fahlke mentioned in commit fbe0ac34

    mentioned in commit fbe0ac34

  • mentioned in issue #124 (closed)

  • Fantastic! Thank you, @simon.praetorius and @joe!

  • Please register or sign in to reply
    Loading