question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug due to multiple writes ?

See original GitHub issue

I’m encountering this bug when trying to run on 4 GPU system

Traceback (most recent call last):
  File "plot_surface.py", line 291, in <module>
    crunch(surf_file, net, w, s, d, trainloader, 'train_loss', 'train_acc', comm, rank, args)
  File "plot_surface.py", line 82, in crunch
    f = h5py.File(surf_file, 'r+' if rank == 0 else 'r')
  File "/home/ubuntu/.local/lib/python2.7/site-packages/h5py/_hl/files.py", line 312, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/home/ubuntu/.local/lib/python2.7/site-packages/h5py/_hl/files.py", line 144, in make_fid
    fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
IOError: Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')

The command I used is:

mpirun -n 4 python plot_surface.py --mpi --cuda --model resnet56 --x=-1:1:51 --y=-1:1:51 \
--model_file cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7 \
--dir_type weights --xnorm filter --xignore biasbn --ynorm filter --yignore biasbn  --plot

What can be the issue since the code is checking for rank 0 before writing ?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:12 (5 by maintainers)

github_iconTop GitHub Comments

3reactions
ljk628commented, Dec 13, 2018

Sorry for the confusion! I changed the requirement of h5py to 2.7.0. in https://github.com/tomgoldstein/loss-landscape/commit/75caf64979cc1d6238672d708decc5ecbf9695f9.

0reactions
KaleabTesseracommented, Jun 6, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

Common Bugs in Writing - CS @ Columbia
If you find yourself saying "In other words," it means you didn't say it clearly enough the first time. Go back and rewrite...
Read more >
Multiple calls to Write() has unexpected overheads? #22
If I make multiple calls to Write, the resulting compressed data stream length is much greater than buffering the data and making a...
Read more >
What are concurrency bugs? - Quora
Concurrency bugs usually happen, when two or more things occur at the same time in a place which wasn't designed to work in...
Read more >
Detecting and Avoiding Concurrency Bugs
Design new bug detection tools to address multiple-variable bugs and order violation bugs. • can pairwisely test concurrent program threads and focus on...
Read more >
How to Write A Good Bug Report? Tips and Tricks
Do not combine multiple problems even if they seem to be similar. Write different reports for each problem. Effective Bug Reporting. Bug ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found