question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

scan_history does not efficiently retrieve sparse metrics

See original GitHub issue

Describe the bug

Calling scan_history returns the very first data point but doesn’t seem to make any subsequent progress. Script seems to be stuck in wandb networking code, sample stacktrace:

Thread 1494 (idle): "MainThread"
    read (ssl.py:911)
    recv_into (ssl.py:1052)
    readinto (socket.py:589)
    _read_status (http/client.py:257)
    begin (http/client.py:296)
    getresponse (http/client.py:1321)
    _make_request (urllib3/connectionpool.py:383)
    urlopen (urllib3/connectionpool.py:603)
    send (requests/adapters.py:449)
    send (requests/sessions.py:646)
    request (requests/sessions.py:533)
    request (requests/api.py:60)
    post (requests/api.py:116)
    execute (gql/transport/requests.py:38)
    _get_result (gql/client.py:60)
    execute (gql/client.py:52)
    execute (wandb/apis/public.py:178)
    __call__ (wandb/old/retry.py:96)
    wrapped_fn (wandb/old/retry.py:132)
    _load_next (wandb/apis/public.py:2002)
    __call__ (wandb/old/retry.py:96)
    wrapped_fn (wandb/old/retry.py:132)
    wrapper (wandb/apis/normalize.py:24)
    __next__ (wandb/apis/public.py:1975)
    fetch_run_data (plot_results.py:15)
    <module> (plot_results.py:23)

The run I’m trying to download has a very large number of data points, but the requested metric is present in only 26 rows and the run page has now trouble loading it.

To Reproduce

My code:

import wandb
from typing import Tuple

def fetch_run_data(descriptor: str, metric: str) -> Tuple[np.array, np.array]:
    api = wandb.Api()
    runs = api.runs("cswinter/deep-codecraft-vs", {"config.descriptor": descriptor})

    run = runs[0]
    step = []
    value = []
    vals = run.scan_history(keys=[metric, '_step'], page_size=1000, min_step=None, max_step=None)
    for entry in vals:
        if metric in entry:
            print('yay')
            print(entry)
            step.append(entry['_step'])
            value.append(entry[metric])
    return np.array(step), np.array(value)

frames, step = fetch_run_data("f2034f-hpsetstandard", "eval_mean_score")

# plot data...

It just outputs this and then hangs for at least several minutes:

yay
{'eval_mean_score': -0.9901801347732544, '_step': 0}

Expected behavior I expect the full set of data points to be retrieved within seconds.

Operating System WSL 1 wandb==0.10.12

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
cswintercommented, Jan 9, 2021

Hmm ok, history does seem to do what I want when setting the right keys. Is it guaranteed to return the full underlying dataset without any sampling/interpolation when settings the samples param higher than the number of data points?

0reactions
vanpeltcommented, Jan 9, 2021

Gotcha, the pagination logic for scanning the entire history needs to assume there could be a data point at every step so it needs to page across the entire 125 million which is what makes it so slow. We’re working on a big overhaul of our metrics backend so hopefully we can address this case better in the future.

Read more comments on GitHub >

github_iconTop Results From Across the Web

scan_history does not efficiently retrieve sparse metrics #1680
Calling scan_history returns the very first data point but doesn't seem to make any subsequent progress. Script seems to be stuck in wandb ......
Read more >
Best practices for querying and scanning data
This section covers some best practices for using Query and Scan operations in Amazon DynamoDB. Performance considerations for scans.
Read more >
If it ain't broke, don't fix it: Sparse metric repair* - Anna C. Gilbert
We refer to this problem as sparse metric repair. Note that we do not insist that the repaired metric be. Euclidean. It could...
Read more >
What is meant by sparse data/ datastore/ database?
In a regular database, rows are sparse but columns are not. When a row is created, storage is allocated for every column, irrespective...
Read more >
Detection of a sparse submatrix of a high-dimensional noisy ...
null hypothesis that all elements of the matrix Y are i.i.d., ... statistic is free of n and m, but the scan statistic...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found