scan_history does not efficiently retrieve sparse metrics
See original GitHub issueDescribe the bug
Calling scan_history
returns the very first data point but doesn’t seem to make any subsequent progress.
Script seems to be stuck in wandb networking code, sample stacktrace:
Thread 1494 (idle): "MainThread"
read (ssl.py:911)
recv_into (ssl.py:1052)
readinto (socket.py:589)
_read_status (http/client.py:257)
begin (http/client.py:296)
getresponse (http/client.py:1321)
_make_request (urllib3/connectionpool.py:383)
urlopen (urllib3/connectionpool.py:603)
send (requests/adapters.py:449)
send (requests/sessions.py:646)
request (requests/sessions.py:533)
request (requests/api.py:60)
post (requests/api.py:116)
execute (gql/transport/requests.py:38)
_get_result (gql/client.py:60)
execute (gql/client.py:52)
execute (wandb/apis/public.py:178)
__call__ (wandb/old/retry.py:96)
wrapped_fn (wandb/old/retry.py:132)
_load_next (wandb/apis/public.py:2002)
__call__ (wandb/old/retry.py:96)
wrapped_fn (wandb/old/retry.py:132)
wrapper (wandb/apis/normalize.py:24)
__next__ (wandb/apis/public.py:1975)
fetch_run_data (plot_results.py:15)
<module> (plot_results.py:23)
The run I’m trying to download has a very large number of data points, but the requested metric is present in only 26 rows and the run page has now trouble loading it.
To Reproduce
My code:
import wandb
from typing import Tuple
def fetch_run_data(descriptor: str, metric: str) -> Tuple[np.array, np.array]:
api = wandb.Api()
runs = api.runs("cswinter/deep-codecraft-vs", {"config.descriptor": descriptor})
run = runs[0]
step = []
value = []
vals = run.scan_history(keys=[metric, '_step'], page_size=1000, min_step=None, max_step=None)
for entry in vals:
if metric in entry:
print('yay')
print(entry)
step.append(entry['_step'])
value.append(entry[metric])
return np.array(step), np.array(value)
frames, step = fetch_run_data("f2034f-hpsetstandard", "eval_mean_score")
# plot data...
It just outputs this and then hangs for at least several minutes:
yay
{'eval_mean_score': -0.9901801347732544, '_step': 0}
Expected behavior I expect the full set of data points to be retrieved within seconds.
Operating System WSL 1 wandb==0.10.12
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
scan_history does not efficiently retrieve sparse metrics #1680
Calling scan_history returns the very first data point but doesn't seem to make any subsequent progress. Script seems to be stuck in wandb ......
Read more >Best practices for querying and scanning data
This section covers some best practices for using Query and Scan operations in Amazon DynamoDB. Performance considerations for scans.
Read more >If it ain't broke, don't fix it: Sparse metric repair* - Anna C. Gilbert
We refer to this problem as sparse metric repair. Note that we do not insist that the repaired metric be. Euclidean. It could...
Read more >What is meant by sparse data/ datastore/ database?
In a regular database, rows are sparse but columns are not. When a row is created, storage is allocated for every column, irrespective...
Read more >Detection of a sparse submatrix of a high-dimensional noisy ...
null hypothesis that all elements of the matrix Y are i.i.d., ... statistic is free of n and m, but the scan statistic...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hmm ok,
history
does seem to do what I want when setting the rightkeys
. Is it guaranteed to return the full underlying dataset without any sampling/interpolation when settings thesamples
param higher than the number of data points?Gotcha, the pagination logic for scanning the entire history needs to assume there could be a data point at every step so it needs to page across the entire 125 million which is what makes it so slow. We’re working on a big overhaul of our metrics backend so hopefully we can address this case better in the future.