Bug due to multiple writes ?
See original GitHub issueI’m encountering this bug when trying to run on 4 GPU system
Traceback (most recent call last):
File "plot_surface.py", line 291, in <module>
crunch(surf_file, net, w, s, d, trainloader, 'train_loss', 'train_acc', comm, rank, args)
File "plot_surface.py", line 82, in crunch
f = h5py.File(surf_file, 'r+' if rank == 0 else 'r')
File "/home/ubuntu/.local/lib/python2.7/site-packages/h5py/_hl/files.py", line 312, in __init__
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File "/home/ubuntu/.local/lib/python2.7/site-packages/h5py/_hl/files.py", line 144, in make_fid
fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 78, in h5py.h5f.open
IOError: Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')
The command I used is:
mpirun -n 4 python plot_surface.py --mpi --cuda --model resnet56 --x=-1:1:51 --y=-1:1:51 \
--model_file cifar10/trained_nets/resnet56_sgd_lr=0.1_bs=128_wd=0.0005/model_300.t7 \
--dir_type weights --xnorm filter --xignore biasbn --ynorm filter --yignore biasbn --plot
What can be the issue since the code is checking for rank 0 before writing ?
Issue Analytics
- State:
- Created 5 years ago
- Comments:12 (5 by maintainers)
Top Results From Across the Web
Common Bugs in Writing - CS @ Columbia
If you find yourself saying "In other words," it means you didn't say it clearly enough the first time. Go back and rewrite...
Read more >Multiple calls to Write() has unexpected overheads? #22
If I make multiple calls to Write, the resulting compressed data stream length is much greater than buffering the data and making a...
Read more >What are concurrency bugs? - Quora
Concurrency bugs usually happen, when two or more things occur at the same time in a place which wasn't designed to work in...
Read more >Detecting and Avoiding Concurrency Bugs
Design new bug detection tools to address multiple-variable bugs and order violation bugs. • can pairwisely test concurrent program threads and focus on...
Read more >How to Write A Good Bug Report? Tips and Tricks
Do not combine multiple problems even if they seem to be similar. Write different reports for each problem. Effective Bug Reporting. Bug ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Sorry for the confusion! I changed the requirement of h5py to 2.7.0. in https://github.com/tomgoldstein/loss-landscape/commit/75caf64979cc1d6238672d708decc5ecbf9695f9.
@ascenoputing This PR https://github.com/tomgoldstein/loss-landscape/pull/28 should fix your issue.