Demystifying Floating Points

Floating points are a way of representing numbers based on the idea that the larger the magnitude of a number, the less we care about knowing its precise value. If you’re anything like me (until recently) you use floating points regularly in your code with a rough understanding of what to expect, but don’t understand the specifics of what floats can and can’t represent. For example what’s the biggest floating point, or the smallest positive floating point? Or how many times can you add 1.0 to a floating point before something bad happens, and what is that something bad? How many floating point values are there between 0 and 1? What about between 1 and 2?

The answer to these questions is, of course, “it depends”, so let’s talk about a specific floating point standard, the “binary32” type defined in IEEE 754-2008. This is the commonly-found single-precision floating point type, which backs the f32 type in rust, and usually backs the float type in c (though of course technically this is left unspecified). From now on this this post, I will refer to this type as simply “float”.

Here’s what a float can represent:

The second point is the most important for understanding floating points. Each successive pair of powers of 2 has 2^23 - 1 floating point values evenly spread out between them. There are 2^23 - 1 floats between 0.125 and 0.25, between 1 and 2, between 1024 and 2048, and between 8,388,608 (2^23) and 16,777,216 (2^24). As the numeric range between consecutive powers of 2 increases, the number of floats between them stays the same at 2^23 - 1; the floats just get more spread out. This is the reason that values with lower magnitudes can be more precisely represented with floating points.

Some implications of this:

Here’s how floats are encoded:

diagram: [ Sign (1 bit) | Exponent (8 bits) | Fraction (23 bits) ]

So putting all this together, assuming the literal value of the exponent is inclusively between 1 and 254, the formula giving the value of a float is:

(-1)^sign x 2^(exponent - 127) x (1 + (fraction / 2^23))

Breaking down each part: