Regression Tests

Last updated on 2026-02-17 | Edit this page

Estimated time: 13 minutes

Overview

Questions

  • How can we detect changes in program outputs?
  • How can snapshots make this easier?

Objectives

  • Explain what regression tests are and when they’re useful
  • Write a manual regression test (save output and compare later)
  • Use Snaptol snapshots to simplify output/array regression testing
  • Use tolerances (rtol/atol) to handle numerical outputs safely

1) Introduction


In short, a regression test asks “this test used to produce X, does it still produce X?”. This can help us detect unexpected or unwanted changes in the output of a program.

They are particularly useful,

  • when beginning to add tests to an existing project,

  • when adding unit tests to all parts of a project is not feasible,

  • to quickly give a good test coverage,

  • when it does not matter if the output is correct or not.

These types of tests are not a substitute for unit tests, but rather are complimentary.

2) Manual example


Let’s make a regression test in a test.py file. It is going to utilise a “very complex” processing function to simulate the processing of data,

PYTHON

# test.py

def very_complex_processing(data: list):
    return [x ** 2 - 10 * x + 42 for x in data]

Let’s write the basic structure for a test with example input data, but for now we will simply print the output,

PYTHON

# test.py continued

def test_something():
    input_data = [i for i in range(8)]

    processed_data = very_complex_processing(input_data)

    print(processed_data)

Let’s run pytest with reduced verbosity -q and print the statement from the test -s,

$ pytest -qs test.py
[42, 33, 26, 21, 18, 17, 18, 21]
.
1 passed in 0.00s

We get a list of output numbers that simulate the result of a complex function in our project. Let’s save this data at the top of our test.py file so that we can assert that it is always equal to the output of the processing function,

PYTHON

# test.py

SNAPSHOT_DATA = [42, 33, 26, 21, 18, 17, 18, 21]

def very_complex_processing(data: list):
    return [x ** 2 - 10 * x + 42 for x in data]

def test_something():
    input_data = [i for i in range(8)]

    processed_data = very_complex_processing(input_data)

    assert SNAPSHOT_DATA == processed_data

We call the saved version of the data a “snapshot”.

We can now be assured that any development of the code that erroneously alters the output of the function will cause the test to fail. For example, suppose we slightly altered the very_complex_processing function,

PYTHON

def very_complex_processing(data: list):
    return [3 * x ** 2 - 10 * x + 42 for x in data]
#           ^^^^ small change

Then, running the test causes it to fail,

$ pytest -q test.py
F
__________________________________ FAILURES _________________________________
_______________________________ test_something ______________________________

    def test_something():
        input_data = [i for i in range(8)]

        processed_data = very_complex_processing(input_data)

>       assert SNAPSHOT_DATA == processed_data
E       assert [42, 33, 26, 21, 18, 17, ...] == [42, 35, 34, 39, 50, 67, ...]
E         At index 1 diff: 33 != 35

test.py:12: AssertionError
1 failed in 0.03s

If the change was intentional, then we could print the output again and update SNAPSHOT_DATA. Otherwise, we would want to investigate the cause of the change and fix it.

3) Snaptol


So far, performing a regression test manually has been a bit tedious. Storing the output data at the top of our test file,

  • adds clutter,

  • is laborious,

  • is prone to errors.

We could move the data to a separate file, but once again we would have to handle its contents manually.

There are tools out there that can handle this for us, one widely known is Syrupy. A new tool has also been developed called Snaptol, that we will use here.

Let’s use the original very_complex_processing function, and introduce the snaptolshot fixture,

PYTHON

# test.py

def very_complex_processing(data: list):
    return [x ** 2 - 10 * x + 42 for x in data]

def test_something(snaptolshot):
    input_data = [i for i in range(8)]

    processed_data = very_complex_processing(input_data)

    assert snaptolshot == processed_data

Notice that we have replaced the SNAPSHOT_DATA variable with snaptolshot, which is an object provided by Snaptol that can handle the snapshot file management, amongst other smart features, for us.

When we run the test for the first time, we will be met with a FileNotFoundError,

$ pytest -q test.py
F
================================== FAILURES =================================
_______________________________ test_something ______________________________

    def test_something(snaptolshot):
        input_data = [i for i in range(8)]

        processed_data = very_complex_processing(input_data)

>       assert snaptolshot == processed_data
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

test.py:10:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.../snapshot.py:167: FileNotFoundError
========================== short test summary info ==========================
FAILED test.py::test_something - FileNotFoundError: Snapshot file not found.
1 failed in 0.03s

This is because we have not yet created the snapshot file. Let’s run snaptol in update mode so that it knows to create the snapshot file for us. This is similar to the print, copy and paste step in the manual approach above,

$ pytest -q test.py --snaptol-update
.
1 passed in 0.00s

This tells us that the test performed successfully, and, because we were in update mode, an associated snapshot file was created with the name format <test_file>.<test_name>.json in a dedicated directory,

$ tree
.
├── __snapshots__
│   └── test.test_something.json
└── test.py

The contents of the JSON file are the same as in the manual example,

JSON

[
  42,
  33,
  26,
  21,
  18,
  17,
  18,
  21
]

As the data is saved in JSON format, almost any Python object can be used in a snapshot test – not just integers and lists.

Just as previously, if we alter the function then the test will fail. We can similarly update the snapshot file with the new output with the --snaptol-update flag as above.

Callout

Note: --snaptol-update will only update snapshot files for tests that failed in the previous run of pytest. This is because the expected workflow is 1) run pytest, 2) observe a test failure, 3) if happy with the change then run the update, --snaptol-update. This stops the unnecessary rewrite of snapshot files in tests that pass – which is particularly important when we allow for tolerance as explained in the next section.

Floating point numbers

Consider a simulation code that uses algorithms that depend on convergence – perhaps a complicated equation that does not have an exact answer but can be approximated numerically within a given tolerance. This, along with the common use of controlled randomised initial conditions, can lead to results that differ slightly between runs.

In the example below, we use the estimate_pi function from the “Floating Point Data” module. It relies on the use of randomised input and as a result the determined value will vary slightly between runs.

PYTHON

# test_tol.py
import random

def estimate_pi(iterations):
    num_inside = 0
    for _ in range(iterations):
        x = random.random()
        y = random.random()
        if x**2 + y**2 < 1:
            num_inside += 1
    return 4 * num_inside / iterations

def test_something(snaptolshot):
    result = estimate_pi(10000000)

    print(result)

    snaptolshot.assert_allclose(result, rtol=1e-03, atol=0.0)

Notice that here we use a method of the snaptolshot object called assert_allclose. This is a wrapper around the numpy.testing.assert_allclose function, as discussed in the “Floating Point Data” module, and allows us to specify tolerances for the comparison rather than asserting an exact equality.

Let’s run the test initially like before but create the snapshot file straight away by running in update mode,

$ pytest -qs test_tol.py --snaptol-update-all
3.1423884
.
1 passed in 0.30s

Even with ten million data points, the approximation of pi, 3.1423884, isn’t great!

Callout

Note: remember that the result of a regression test is not the important part, but rather on how that result changes in future runs. We want to focus on whether our code reproduces the result in future runs – in this case within a given tolerance to account for the randomness.

In the test above, we supplied rtol and atol arguments to the function in the assertion. These are used to control the tolerance of the comparison between the snapshot and the actual output. This means on future runs of the test, the computed value will not be required to exactly match the snapshot, but rather within the given tolerance. Remember,

  • rtol is the relative tolerance, useful for handling large numbers (e.g magnitude much greater than 1),
  • atol is the absolute tolerance, useful for numbers “near zero” (e.g magnitude much less than 1).

If we run the test again, we see the printed output is different to that saved to file, but the test still passes,

$ pytest -qs test_tol.py
3.1408724
.
1 passed in 0.24s

Exercises


Challenge

Create your own regression test

  • Add the below code to a new file and add your own code to the ... sections.

  • On the first run, capture the output of your implemented very_complex_processing function and store it appropriately.

  • After, ensure the test compares the stored data to the result, and passes successfully. Avoid using floats for now.

PYTHON

def very_complex_processing(data):
    return ...

def test_something():
    input_data = ...

    processed_data = very_complex_processing(input_data)

    assert ...

PYTHON

SNAPSHOT_DATA = [42, 33, 26, 21, 18, 17, 18, 21]

def very_complex_processing(data: list):
    return [x ** 2 - 10 * x + 42 for x in data]

def test_something():
    input_data = [i for i in range(8)]

    processed_data = very_complex_processing(input_data)

    assert SNAPSHOT_DATA == processed_data
Challenge

Implement a regression test with Snaptol

  • Using the estimate_pi function above, implement a regression test using the snaptolshot object.

  • Ensure to use the assert_allclose method to compare the result to the snapshot carefully.

  • On the first pass, ensure that it fails due to a FileNotFoundError.

  • Run it in update mode to save the snapshot, and ensure it passes successfuly on future runs.

PYTHON

import random

def estimate_pi(iterations):
    num_inside = 0
    for _ in range(iterations):
        x = random.random()
        y = random.random()
        if x**2 + y**2 < 1:
            num_inside += 1
    return 4 * num_inside / iterations

def test_something(snaptolshot):
    result = estimate_pi(10000000)

    snaptolshot.assert_allclose(result, rtol=1e-03, atol=0.0)
Challenge

More complex regression tests

  • Create two separate tests that both utilise the estimate_pi function as a fixture.

  • Using different tolerances for each test, assert that the first passes successfully, and assert that the second raises an AssertionError. Hints: 1) remember to look back at the “Testing for Exceptions” and “Fixtures” modules, 2) the error in the pi calculation algorithm is \(\frac{1}{\sqrt{N}}\) where \(N\) is the number of points used.

PYTHON

import random
import pytest

@pytest.fixture
def estimate_pi():
    iterations = 10000000
    num_inside = 0
    for _ in range(iterations):
        x = random.random()
        y = random.random()
        if x**2 + y**2 < 1:
            num_inside += 1
    return 4 * num_inside / iterations

def test_pi_passes(snaptolshot, estimate_pi):
    # Passes due to loose tolerance.
    snaptolshot.assert_allclose(estimate_pi, rtol=1e-03, atol=0.0)

def test_pi_fails(snaptolshot, estimate_pi):
    # Fails due to tight tolerance.
    with pytest.raises(AssertionError):
        snaptolshot.assert_allclose(estimate_pi, rtol=1e-04, atol=0.0)
Key Points
  • Regression testing ensures that the output of a function remains consistent between test runs.
  • The pytest plugin, snaptol, can be used to simplify this process and cater for floating point numbers that may need tolerances on assertion checks.