Tuesday 22 May 2012

Values for standard hardware floating point arithmetics


IEEE 754 - 2008 Common name C++ abstracts type Base Precision Machine epsilon C++ or Python formula Value

binary16 half precision not available 2 11 2 -11 pow (2, -11) 4.88e-04

binary32 single precision float 2 24 2 -24 pow (2, -24) 5.96e-08

binary64 double precision double 2 53 2 -53 pow (2, -53) 1.11e-16

binary128 quad(ruple) precision long double 2 113 2 -113 pow (2, -113) 9.63e-35

No comments:

Post a Comment