seg fault issue by gpu memory
See original GitHub issueHello,
Iโm using single GTX1070 with 8GB GDDR5, and trying to train deepspeech.pytorch with TEDLIUM corpus. However, a few trials were failed by seg fault, which I guess itโs originated from the OOM issue. Iโve tried to reduce the batch size, but now I found another parameter --num_workers
. I wonder which parameter can be more profitable to manage OOM issue. Could you give me any guide for this?
Epoch: [1][10823/11373] Time 0.504 (0.243) Data 0.011 (0.021) Loss 218.2781 (164.8313) โยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
Epoch: [1][10824/11373] Time 0.512 (0.244) Data 0.011 (0.021) Loss 244.8923 (164.8387) โยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
Epoch: [1][10825/11373] Time 0.503 (0.244) Data 0.012 (0.021) Loss 233.4698 (164.8451) โยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
./train.sh: line 12: 35528 Segmentation fault (core dumped) python train.py --train_manifest data/ted/ted_train_manifest.csv --val data/ted/ted_val_manifest.csv --sโยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยทยท
ample_rate 8000 --augment --batch_size 8 --epochs 100 --cuda --checkpoint --save_folder models/20170823
``` โ
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
Segmentation fault when GPUs are already used #152 - GitHub
When I set the nvidia driver in exclusive mode and one of the GPU is already used by another process, I get a...
Read more >Segmentation Fault when using GPU - Google Groups
googlegroups.com. Everything seems to work fine with the CPU but I get seg faults with the GPU. rescomp-12-250088:Project Brett$ python gputest.py.
Read more >What causes this segmentation fault (core dumped) error at ...
What causes this segmentation fault (core dumped) error at cudaMemcpy when copying to GPU? ยท 1 ยท As a simple fix, just delete...
Read more >Segmentation faults and illegal memory address accesses ...
A segfault (segmentation fault; Windows: general protection fault) is something that occurs in host code running on the CPU and most likelyย ...
Read more >Why do I get a segmentation fault for memory checking?
device = torch.device("cuda:0") In [2]: In [2]: memory = torch.cuda.memory_allocated(device) Segmentation fault (core dumped). And my GPU ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
If you activate the warnings of the PyTorch 0.2.0, you receive something like this:
The warming indicates that our code works but that it is not optimal of a memory allocation point of view.
I guess if we change the code in order to follow the recommendation above, we will probably solve lots of out of memory related problems.
Well, I have no reference to notice any slowdown issue since all previous training was failed by seg fault. My data is as big as taking almost half day for single epoch, and I donโt feel that it is particularly slower than before.