question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

bestmodel.pth size shrank while it was being uploaded

See original GitHub issue

wandb --version && python --version && uname

  • Weights and Biases version: 0.8.25
  • Python version: 3.7.6
  • Operating System: Linux

Description

While running a fastai callback to train a model in jupyter notebook by doing something like:

learn_clas.fit_one_cycle(10, slice(2e-3/decay_factor,2e-3), moms=moms, callbacks=WandbCallback(learn_clas))
#... do some other stuff
learn_clas.fit_one_cycle(10, slice(3e-5/decay_factor,3e-5), moms=moms, callbacks=WandbCallback(learn_clas))

In the results that get written out to the notebook I often see errors such as:

image

Here’s the text of the error message.

wandb: ERROR Error uploading "bestmodel.pth": CommError, File /home/fastai/keyword-extraction/notebooks/wandb/run-20200206_091629-h6xl4gbi/bestmodel.pth size shrank from 239376385 to 232153088 while it was being uploaded.
wandb: ERROR Error uploading "bestmodel.pth": CommError, File /home/fastai/keyword-extraction/notebooks/wandb/run-20200206_091629-h6xl4gbi/bestmodel.pth size shrank from 239376385 to 80355328 while it was being uploaded.

It would appear that either there’s some kind of race condition - or that the file is being overwritten by the second training loop while still being uploaded by the first.

Update:

It’s also happening with lots of other files, eg:

wandb: ERROR Error uploading "___batch_archive_1.tgz": CommError, File /tmp/tmpnn7k_ppewandb/___batch_archive_1.tgz size shrank from 416086 to 301398 while it was being uploaded.
Exception in thread Thread-78:
Traceback (most recent call last):
  File "/home/fastai/anaconda3/envs/keyword-extraction/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/home/fastai/anaconda3/envs/keyword-extraction/lib/python3.7/site-packages/wandb/file_pusher.py", line 85, in run
    self.cleanup_file()
  File "/home/fastai/anaconda3/envs/keyword-extraction/lib/python3.7/site-packages/wandb/file_pusher.py", line 154, in cleanup_file
    os.unlink(self.tgz_path)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpnn7k_ppewandb/batch-1.tgz'

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:15 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
maxzzzecommented, Nov 12, 2020

@borisdayma I just now switched from using optuna to ray and I am not experiencing any issues like this one anymore.

Im tunning transformers 3.5.0 now but was experiencing the above issue before using transformers 3.4.0 (optuna again).

1reaction
vanpeltcommented, Aug 28, 2020

Hey guys, can you try our new client library. We’re about a week away from official release, but the errors you’re seeing should be fixed. You can install with: pip install wandb -U --pre

Read more comments on GitHub >

github_iconTop Results From Across the Web

Machine Learning - KX Insights - Kx Systems
This includes nodes to perform the modeling itself and nodes to preprocess and ... Buffer Size, Number of records to observe before fitting...
Read more >
There is a shrink in my Access Database size
Occasionally, when my employer wants to add data into the database, I share it for him on Share files. He will upload data...
Read more >
Introducing Generalized PathSeeker® (GPS) - Minitab
Here we turn to a hands on tour of getting starting with GPS and review the essential ... Squared Error (MSE) indexed by...
Read more >
IBM® SPSS® Amos™ 28 User's Guide
This edition applies to IBM® SPSS® Amos™ 28 and to all subsequent releases and ... Amos will shrink your path diagram to fit...
Read more >
A fast and scalable framework for large-scale and ... - PLOS
The UK Biobank is a very large, prospective population-based cohort study across the United Kingdom. It provides unprecedented opportunities ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found