Segmentation fault
See original GitHub issueThe model fails to do a forward pass in the train step. The error reported is just “Segmentation fault” :-
dataset cityscapes_train
batch_size 1
data_dir ./dataset/cityscapes
data_list ./dataset/list/cityscapes/train.lst
ignore_label 255
input_size 769,769
is_training False
learning_rate 0.01
momentum 0.9
not_restore_last False
num_classes 19
start_iters 0
num_steps 40000
power 0.9
random_mirror True
random_scale True
random_seed 304
restore_from ./pretrained_model/resnet101-imagenet.pth
save_num_images 2
save_pred_every 5000
snapshot_dir checkpoint/snapshots_resnet101_asp_oc_dsn_1e-2_5e-4_8_40000/
weight_decay 0.0005
gpu 0,3,4
ohem_thres 0.7
ohem_thres1 0.8
ohem_thres2 0.5
use_weight True
use_val False
use_extra False
ohem False
ohem_keep 0
network resnet101
method asp_oc_dsn
reduce True
ohem_single False
use_parallel False
dsn_weight 0.4
pair_weight 1
seed 304
output_path ./seg_output_eval_set
store_output False
use_flip False
use_ms False
predict_choice whole
whole_scale 1
start_epochs 0
end_epochs 120
save_epoch 20
criterion ce
eval False
fix_lr False
log_file
use_normalize_transform False
/data/graphics/toyota-pytorch/OCNet/network/../oc_module/base_oc_block.py:69: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
nn.init.constant(self.W.weight, 0)
/data/graphics/toyota-pytorch/OCNet/network/../oc_module/base_oc_block.py:70: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
nn.init.constant(self.W.bias, 0)
/afs/csail.mit.edu/u/s/smadan/miniconda3/envs/py_36_tens_gpu/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py:24: UserWarning:
There is an imbalance between your GPUs. You may want to exclude GPU 3 which
has less than 75% of the memory or cores of GPU 1. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.
warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
w/ class balance
41650 images are loaded!
learning_rate: 0.01
torch.Size([1, 3, 769, 769])
Segmentation fault
I added a bunch of print statements and saw that the error is happening in the step
preds = model(images)
I checked the GPU usage, there was over 11GB of GPU memory free when the error occured, so it’s not a memory issue. Also, when I ran the .sh file initially, it was reporting errors because the directories for log/log_train and log_test were not created. I created them manually, and that error was resolved. But not, forward pass fails in the first iteration itself. Any leads?
Issue Analytics
- State:
- Created 5 years ago
- Comments:25
Top Results From Across the Web
Segmentation fault - Wikipedia
Segmentation faults are a common class of error in programs written in languages like C that provide low-level memory access and few to...
Read more >c++ - What is a segmentation fault? - Stack Overflow
A segmentation fault occurs when a program attempts to access a memory location that it is not allowed to access, or attempts to...
Read more >Identify what's causing segmentation faults (segfaults)
A segmentation fault (aka segfault) is a common condition that causes programs to crash; they are often associated with a file named core...
Read more >Core Dump (Segmentation fault) in C/C++ - GeeksforGeeks
When a piece of code tries to do read and write operation in a read only location in memory or freed block of...
Read more >What Is a Segmentation Fault in Linux?
A segmentation fault, or segfault, is a memory error in which a program tries to access a memory address that does not exist...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@Spandan-Madan Hi, I use the inplace-abn module from https://github.com/liutinglt/CE2P to replace the file ’ inplace_abn’ to solve the problem .
Best,
@ackness ,I tried to run it according to your method, but there are still some problems, could you send me a working one? qq:2232661644, too many thanks!