Floating Point Data

Last updated on 2026-02-17 | Edit this page

Estimated time: 15 minutes

Overview

Questions

  • What are the best practices when working with floating point data?
  • How do you compare objects in libraries like numpy?

Objectives

  • Learn how to test floating point data with tolerances.
  • Learn how to compare objects in libraries like numpy.

Floating Point Data


Real numbers are encountered very frequently in research, but it’s quite likely that they won’t be ‘nice’ numbers like 2.0 or 0.0. Instead, the outcome of our code might be something like 2.34958124890e-31, and we may only be confident in that answer to a certain precision.

Computers typically represent real numbers using a ‘floating point’ representation, which truncates their precision to a certain number of decimal places. Floating point arithmetic errors can cause a significant amount of noise in the last few decimal places. This can be affected by:

  • Choice of algorithm.
  • Precise order of operations.
  • Inherent randomness in the calculation.

We could therefore test our code using assert result == 2.34958124890e-31, but it’s possible that this test could erroneously fail in future for reasons outside our control. This lesson will teach best practices for handling this type of data.

Libraries like NumPy, SciPy, and Pandas are commonly used to interact with large quantities of floating point numbers. NumPy provides special functions to assist with testing.

Relative and Absolute Tolerances

Rather than testing that a floating point number is exactly equal to another, it is preferable to test that it is within a certain tolerance. In most cases, it is best to use a relative tolerance:

PYTHON

from math import fabs

def test_float_rtol():
    expected = 7.31926e12  # Reference solution
    actual = my_function()
    rtol = 1e-3
    # Use fabs to ensure a positive result!
    assert fabs((actual - expected) / expected) < rtol

In some situations, such as testing a number is close to zero without caring about exactly how large it is, it is preferable to test within an absolute tolerance:

PYTHON

from math import fabs

def test_float_atol():
    expected = 0.0  # Reference solution
    actual = my_function()
    atol = 1e-5
    # Use fabs to ensure a positive result!
    assert fabs(actual - expected) < atol

Let’s practice with a function that estimates the value of pi (very inefficiently!).

Challenge

Testing with tolerances

  • Write this function to a file estimate_pi.py:

PYTHON

import random

def estimate_pi(iterations):
    """
    Estimate pi by counting the number of random points
    inside a quarter circle of radius 1
    """
    num_inside = 0
    for _ in range(iterations):
        x = random.random()
        y = random.random()
        if x**2 + y**2 < 1:
            num_inside += 1
    return 4 * num_inside / iterations
  • Add a file test_estimate_pi.py, and include a test for this function using both absolute and relative tolerances.
  • Find an appropriate number of iterations so that the test finishes quickly, but keep in mind that both atol and rtol will need to be modified accordingly!

PYTHON

import random
from math import fabs

from estimate_pi import estimate_pi

def test_estimate_pi():
    random.seed(0)
    expected = 3.141592654
    actual = estimate_pi(iterations=10000)
    # Test absolute tolerance
    atol = 1e-2
    assert fabs(actual - expected) < atol
    # Test relative tolerance
    rtol = 5e-3
    assert fabs((actual - expected) / expected) < rtol

In this case the absolute and relative tolerances should be similar, as the expected result is close in magnitude to 1.0, but in principle they could be very different!

The built-in function math.isclose can be used to simplify these checks:

PYTHON

assert math.isclose(a, b, rel_tol=rtol, abs_tol=atol)

Both rel_tol and abs_tol may be provided, and it will return True if either of the conditions are satisfied.

Challenge

Using math.isclose

  • Adapt the test you wrote in the previous challenge to make use of the math.isclose function.

PYTHON

import math
import random

from estimate_pi import estimate_pi

def test_estimate_pi():
    random.seed(0)
    expected = 3.141592654
    actual = estimate_pi(iterations=10000)
    atol = 1e-2
    rtol = 5e-3
    assert math.isclose(actual, expected, abs_tol=atol, rel_tol=rtol)

NumPy

NumPy is a common library used in research. Instead of the usual assert a == b, NumPy has its own testing functions that are more suitable for comparing NumPy arrays. These functions are the ones you are most likely to use:

  • numpy.testing.assert_array_equal is used to compare two NumPy arrays for equality – best used for integer data.
  • numpy.testing.assert_allclose is used to compare two NumPy arrays with a tolerance for floating point numbers.

Here are some examples of how to use these functions:

PYTHON


def test_numpy_arrays():
    """Test that numpy arrays are equal"""
    # Create two numpy arrays
    array1 = np.array([1, 2, 3])
    array2 = np.array([1, 2, 3])
    # Check that the arrays are equal
    np.testing.assert_array_equal(array1, array2)

# Note that np.testing.assert_array_equal even works with multidimensional numpy arrays!

def test_2d_numpy_arrays():
    """Test that 2d numpy arrays are equal"""
    # Create two 2d numpy arrays
    array1 = np.array([[1, 2], [3, 4]])
    array2 = np.array([[1, 2], [3, 4]])
    # Check that the nested arrays are equal
    np.testing.assert_array_equal(array1, array2)

def test_numpy_arrays_with_tolerance():
    """Test that numpy arrays are equal with tolerance"""
    # Create two numpy arrays
    array1 = np.array([1.0, 2.0, 3.0])
    array2 = np.array([1.00009, 2.0005, 3.0001])
    # Check that the arrays are equal with tolerance
    np.testing.assert_allclose(array1, array2, atol=1e-3)

The NumPy testing functions can be used on anything NumPy considers to be ‘array-like’. This includes lists, tuples, and even individual floating point numbers if you choose. They can also be used for other objects in the scientific Python ecosystem, such as Pandas Series/DataFrames.

Callout

The Pandas library also provides its own testing functions:

  • pandas.testing.assert_frame_equal
  • pandas.testing.assert_series_equal

These functions can also take rtol and atol arguments, so can fulfill the role of both numpy.testing.assert_array_equal and numpy.testing.assert_allclose.

Challenge

Checking if NumPy arrays are equal

In statistics/stats.py add this function to calculate the cumulative sum of a NumPy array:

PYTHON

import numpy as np

def calculate_cumulative_sum(array: np.ndarray) -> np.ndarray:
    """Calculate the cumulative sum of a numpy array"""
    
    # don't use the built-in numpy function
    result = np.zeros(array.shape)
    result[0] = array[0]
    for i in range(1, len(array)):
        result[i] = result[i-1] + array[i]

    return result

Then write a test for this function by comparing NumPy arrays.

PYTHON

import numpy as np
from stats import calculate_cumulative_sum

def test_calculate_cumulative_sum():
    """Test calculate_cumulative_sum function"""
    array = np.array([1, 2, 3, 4, 5])
    expected_result = np.array([1, 3, 6, 10, 15])
    np.testing.assert_array_equal(calculate_cumulative_sum(array), expected_result)
Key Points
  • When comparing floating point data, you should use relative/absolute tolerances instead of testing for equality.
  • Numpy arrays cannot be compared using the == operator. Instead, use numpy.testing.assert_array_equal and numpy.testing.assert_allclose.