Floats and Doubles for Dummies: Understanding Floating Points

WIP: Work in progress. Feel free to give comments

We use decimal numbers in everyday life: Splitting the bill, calculating an Excel formula, using a calculator, pricing a financial product, solving a physics problem, … Yet not everyone understands how the computers work with decimal numbers and how floating point numbers are stored. This series will equip you with necessary understanding and tools to work effectively with the floating point numbers.

Part 1 : One Does Not Simply Compare Floats & Doubles

Generated by https://imgflip.com/memegenerator

What you see is not what you get! Now open your Python REPL. Let’s start with 2 comparisons:

assert (0.2 != 0.199999), "Bad!"

The first comparison is clearly True and passes. The second one is mysteriously counterintuitive and fails A+ high school students. Hint: 0.3, 0.6 and 0.9 are not actually stored in decimal format. Let’s explore those numbers at greater precision:

>>> '%.20f' % 0.3

So all the numbers in this example are rounded. The sum 0.3 + 0.6 is slightly smaller than 0.9 and clearly won’t make comparison work. The abs difference is greater than 1e-16:

>>> '%.20f' % (0.3 + 0.6)

So the sum is not exactly equal like in Maths or in your favourite groceries store. Tell that to the cashier! I hope you will have an important take away from this section:

One does not simply compare floats and doubles naively with ==.

Floats (32-bit) and doubles (64-bit) have different precision. When using Python, you are actually using doubles (64-bit). Remember this crucial point to exactly reproduce the behaviour in C/C++.

Naive near equality comparison

Even an experienced software engineer can go nuts with this problem:

def nearEquals(a, b, epsilon):
return abs(a - b) <= epsilon

I hope you are familiar with the scientific notation 1e-6 which translates to 0.000001. Before jumping to any conclusions, let’s inspect the actual value behind 0.2 and 0.199999:

>>> '%.20f' % 0.2

The truth is the computer does not store 0.2 and friends in decimal format. It does however store in binary format. Since binary numbers cannot exactly represent 0.2 and friends, hence the rounding errors. We are only halfway to understand the test failures. Now let’s compare the difference and the epsilon (absolute tolerance to be precise):

>>> 0.2 - 0.199999

Clearly the difference is greater than the epsilon, hence the failure. The same applies for the second comparison:

>>> 0.2 - 0.1999999

There are more than one way to fix the issue. We will start from the simplest one which requires no deep understanding of floating points and slowly transition to more hard-core solutions.

The quick fix: Everything is relative

From now on let’s use tolerance to describe difference in comparison as epsilon is a reserved term for floating points. The naive comparison function we created had only absolute tolerance and works well on paper. To make it work with computers, we should add a relative tolerance as well.

If you haven’t been working with Maths for some time, think of relative tolerance as percentage (which is also relative difference). Even though the difference between numbers could be outside of absolute tolerance, they could be well within the relative tolerance:

def nearEquals(a, b, absTol, relTol):
if (a == b) or (abs(a - b) <= absTol):
return True
relDiff = 2 * abs(a - b) / (abs(a) + abs(b))
return relDiff <= relTol

With introduction of relative tolerance, everything is working smoothly now. Note that there is not single way to calculate relative tolerance. Different people support different conventions which are all sound and right:

Pick your favourite relative tolerance

Note that the last relative tolerance is the closest to our idea of difference in percentage. Now our comparison works as expected, even without advanced Maths or any understanding of floating points. We also added a nice shortcut: a == b. Before moving to the next part, tell me how do we handle division by zero?

Again, remember Python uses 64-bit floating points (doubles). If you want to convert your code to C/C++, be careful with floats and doubles. In C++ by default you use doubles too. And you need to use float explicitly. It will affect precision and comparison results: