question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Snapshot discovery and reading takes quadratic time

See original GitHub issue

Describe the bug

We’re using syrupy, and it works well. Thank you!

Unfortunately we have nearly 500 snapshots, and our tests runs are starting to get quite slow. It seems like syrupy unfortunately makes the testing take quadratic time with respect to the number of snapshots.

To reproduce

Create this file

# test_performance.py
import pytest
import os

SIZE = int(os.environ.get("SIZE", 1000))

@pytest.mark.parametrize("x", range(SIZE))
def test_performance(x, snapshot):
    assert x == snapshot
    # assert x == x

Run, for instance:

for s in 100 500 1000 2000; do
    echo "size = $s"
    # create the snapshots
    SIZE=$s pytest test_performance.py --snapshot-update
    # just check them
    SIZE=$s pytest test_performance.py
done

The times reported by pytest scale scales quadratically with the number of tests/snapshots (O(size**2)). I think this is because the number of read_file/_read_snapshot_fossil calls and discover_snapshots calls, as reported by python -m cProfile -m py.test test_performance.py, scales linearly (O(size)) with the number of tests/snapshots, and the work required for each call also scales linearly (because the files contain O(size) data, for the snapshots).

The times and number of calls (this is just for the invocation that’s just checking snapshot) is something like:

size time (seconds) discover_snapshots calls read_file calls
100 0.15 200 300
500 1.73 1000 1500
1000 6.24 2000 3000
2000 21.84 4000 6000

Things of note:

  • each doubling from 500 -> 1000 -> 2000 multiplies the time by 4, approximately (classic marker of quadratic performance)
  • the number of calls seems rather large: 2 discovery calls per test/snapshot, and 3 read-file calls

Expected behavior

The test runs should be linear in the number of tests/snapshots. For instance, if the assert x == x line is used (and the snapshot fixture removed) instead of assert x == snapshot, the test run is linear: even SIZE=10000 finishes in < 4s on my machine.

It seems like this could be handled by discovering the snapshots once (or once per file) and reading each snapshot file once too.

Screenshots

Environment (please complete the following information):

  • OS: macOS
  • Syrupy Version: 1.4
  • Python Version: 3.8

Additional context

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
noahnucommented, Aug 20, 2021

Based on your metrics, it seems performance is under control now, or at a minimum it’s no longer quadratic, so I’ll close this issue. If you have other ideas/requests, we’re always open to contributors.

1reaction
huonwcommented, Aug 20, 2021

Yeah, unfortunately each file is generally O(number of assertions) size, because it stores info for each assertion, so when reads the whole file again it’s doing O(number of assertions) work (since it has to at least touch every byte in the file to find the names). That is, it’s O(number of assertions) assertions doing O(number of assertions) parsing work, leading to O(number of assertions**2) quadratic behaviour for each file.

The simple test in the issue is an extreme example, with up to 2000 assertions in a single file, but it has an impact even for our real-world ~450 snapshots spread across 25 files.

Have you tried with substantially more test cases? With and without syrupy?

Here’s the test with SIZE=10000:

version time (s)
syrupy==1.4.3 201
syrupy #543 4.64
no syrupy 3.57

(syrupy 1.4.3 seems to be approximately 52 × the time for SIZE=2000 (8.24s), matching expectations for quadratic behaviour. The no syrupy version is the one described in the issue, removing the snapshot fixture and changing to assert x == x.)

Also feel free to join our discord: https://discord.gg/kZYy8agD

Sorry, I’d prefer not to do so for now, but thanks for the invitation! 😄

Read more comments on GitHub >

github_iconTop Results From Across the Web

TEASER: early and accurate time series classification
Early time series classification (eTSC) is the problem of classifying a time series after as few measurements as possible with the highest ...
Read more >
Dynamic Atomic Snapshots - Prof. Idit Keidar
plexity of every snapshot operation is quadratic in the number of ... the discovery service may throw an exception with a value read...
Read more >
More than a snapshot in time: pathways of disadvantage over ...
The imputation model included all factor scores (0–9 years), age in years, reading and numeracy scores, and auxiliary variables (community ...
Read more >
Is it time xCOMB have some governance? - Ideas - 1Hive forum
The 1hive xCOMB snapshot is up and running! ; Use Quadratic Weighted Voting. Yes; No ; Whitelisted token decision. Looks good; Add/Modify/Remove ...
Read more >
Snapshots of a light-induced metastable hidden ... - Science
We present a theory of fluctuation-dominated process that helps explain the nature of the metastable state. Our results shed light on the origin ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found