Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Snapshot discovery and reading takes quadratic time

See original GitHub issue

Describe the bug

We’re using syrupy, and it works well. Thank you!

Unfortunately we have nearly 500 snapshots, and our tests runs are starting to get quite slow. It seems like syrupy unfortunately makes the testing take quadratic time with respect to the number of snapshots.

To reproduce

Create this file

# test_performance.py
import pytest
import os

SIZE = int(os.environ.get("SIZE", 1000))

@pytest.mark.parametrize("x", range(SIZE))
def test_performance(x, snapshot):
    assert x == snapshot
    # assert x == x

Run, for instance:

for s in 100 500 1000 2000; do
    echo "size = $s"
    # create the snapshots
    SIZE=$s pytest test_performance.py --snapshot-update
    # just check them
    SIZE=$s pytest test_performance.py
done

The times reported by pytest scale scales quadratically with the number of tests/snapshots (O(size**2)). I think this is because the number of read_file/_read_snapshot_fossil calls and discover_snapshots calls, as reported by python -m cProfile -m py.test test_performance.py, scales linearly (O(size)) with the number of tests/snapshots, and the work required for each call also scales linearly (because the files contain O(size) data, for the snapshots).

The times and number of calls (this is just for the invocation that’s just checking snapshot) is something like:

size	time (seconds)	`discover_snapshots` calls	`read_file` calls
100	0.15	200	300
500	1.73	1000	1500
1000	6.24	2000	3000
2000	21.84	4000	6000

Things of note:

each doubling from 500 -> 1000 -> 2000 multiplies the time by 4, approximately (classic marker of quadratic performance)
the number of calls seems rather large: 2 discovery calls per test/snapshot, and 3 read-file calls

Expected behavior

The test runs should be linear in the number of tests/snapshots. For instance, if the assert x == x line is used (and the snapshot fixture removed) instead of assert x == snapshot, the test run is linear: even SIZE=10000 finishes in < 4s on my machine.

It seems like this could be handled by discovering the snapshots once (or once per file) and reading each snapshot file once too.

Screenshots

Environment (please complete the following information):

OS: macOS
Syrupy Version: 1.4
Python Version: 3.8

Additional context

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

noahnucommented, Aug 20, 2021

Based on your metrics, it seems performance is under control now, or at a minimum it’s no longer quadratic, so I’ll close this issue. If you have other ideas/requests, we’re always open to contributors.

1reaction

huonwcommented, Aug 20, 2021

Yeah, unfortunately each file is generally O(number of assertions) size, because it stores info for each assertion, so when reads the whole file again it’s doing O(number of assertions) work (since it has to at least touch every byte in the file to find the names). That is, it’s O(number of assertions) assertions doing O(number of assertions) parsing work, leading to O(number of assertions**2) quadratic behaviour for each file.

The simple test in the issue is an extreme example, with up to 2000 assertions in a single file, but it has an impact even for our real-world ~450 snapshots spread across 25 files.

Have you tried with substantially more test cases? With and without syrupy?

Here’s the test with SIZE=10000:

version	time (s)
syrupy==1.4.3	201
syrupy #543	4.64
no syrupy	3.57

(syrupy 1.4.3 seems to be approximately 5² × the time for SIZE=2000 (8.24s), matching expectations for quadratic behaviour. The no syrupy version is the one described in the issue, removing the snapshot fixture and changing to assert x == x.)

Also feel free to join our discord: https://discord.gg/kZYy8agD

Sorry, I’d prefer not to do so for now, but thanks for the invitation! 😄

Top Results From Across the Web

TEASER: early and accurate time series classification

Early time series classification (eTSC) is the problem of classifying a time series after as few measurements as possible with the highest ...

Dynamic Atomic Snapshots - Prof. Idit Keidar

plexity of every snapshot operation is quadratic in the number of ... the discovery service may throw an exception with a value read...

More than a snapshot in time: pathways of disadvantage over ...

The imputation model included all factor scores (0–9 years), age in years, reading and numeracy scores, and auxiliary variables (community ...

Is it time xCOMB have some governance? - Ideas - 1Hive forum

The 1hive xCOMB snapshot is up and running! ; Use Quadratic Weighted Voting. Yes; No ; Whitelisted token decision. Looks good; Add/Modify/Remove ...

Snapshots of a light-induced metastable hidden ... - Science

We present a theory of fluctuation-dominated process that helps explain the nature of the metastable state. Our results shed light on the origin ......