InvalidArgumentError (see above for traceback): Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
See original GitHub issueI followed your instruction and got this error. Can you please suggest solutions?
mona@pascal:~/computer_vision/tf-faster-rcnn$ GPU_ID=0
mona@pascal:~/computer_vision/tf-faster-rcnn$ ./experiments/scripts/vgg16.sh $GPU_ID pascal_voc
+ set -e
+ export PYTHONUNBUFFERED=True
+ PYTHONUNBUFFERED=True
+ GPU_ID=0
+ DATASET=pascal_voc
+ array=($@)
+ len=2
+ EXTRA_ARGS=
+ EXTRA_ARGS_SLUG=
+ case ${DATASET} in
+ TRAIN_IMDB=voc_2007_trainval
+ TEST_IMDB=voc_2007_test
+ STEPSIZE=50000
+ ITERS=70000
++ date +%Y-%m-%d_%H-%M-%S
+ LOG=experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
+ exec
++ tee -a experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
tee: experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43: No such file or directory
+ echo Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
+ set +x
+ '[' '!' -f output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt.index ']'
+ [[ ! -z '' ]]
+ CUDA_VISIBLE_DEVICES=0
+ time python ./tools/trainval_vgg16_net.py --weight data/imagenet_weights/vgg16.weights --imdb voc_2007_trainval --imdbval voc_2007_test --iters 70000 --cfg experiments/cfgs/vgg16.yml --set TRAIN.STEPSIZE 50000
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Called with args:
Namespace(cfg_file='experiments/cfgs/vgg16.yml', imdb_name='voc_2007_trainval', imdbval_name='voc_2007_test', max_iters=70000, set_cfgs=['TRAIN.STEPSIZE', '50000'], tag=None, weight='data/imagenet_weights/vgg16.weights')
Using config:
{'DATA_DIR': '/home/mona/computer_vision/tf-faster-rcnn/data',
'DEDUP_BOXES': 0.0625,
'EPS': 1e-14,
'EXP_DIR': 'vgg16',
'GPU_ID': 0,
'MATLAB': 'matlab',
'PIXEL_MEANS': array([[[ 102.9801, 115.9465, 122.7717]]]),
'POOLING_MODE': 'crop',
'RNG_SEED': 3,
'ROOT_DIR': '/home/mona/computer_vision/tf-faster-rcnn',
'TEST': {'BBOX_REG': True,
'HAS_RPN': True,
'MAX_SIZE': 1000,
'MODE': 'nms',
'NMS': 0.3,
'PROPOSAL_METHOD': 'selective_search',
'RPN_NMS_THRESH': 0.7,
'RPN_POST_NMS_TOP_N': 300,
'RPN_PRE_NMS_TOP_N': 6000,
'RPN_TOP_N': 5000,
'SCALES': [600],
'SVM': False},
'TRAIN': {'ASPECT_GROUPING': False,
'BATCH_SIZE': 256,
'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
'BBOX_NORMALIZE_TARGETS': True,
'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True,
'BBOX_REG': True,
'BBOX_THRESH': 0.5,
'BG_THRESH_HI': 0.5,
'BG_THRESH_LO': 0.0,
'BIAS_DECAY': False,
'DISPLAY': 20,
'DOUBLE_BIAS': True,
'FG_FRACTION': 0.25,
'FG_THRESH': 0.5,
'GAMMA': 0.1,
'HAS_RPN': True,
'IMS_PER_BATCH': 1,
'LEARNING_RATE': 0.001,
'MAX_SIZE': 1000,
'MOMENTUM': 0.9,
'PROPOSAL_METHOD': 'gt',
'RPN_BATCHSIZE': 256,
'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
'RPN_CLOBBER_POSITIVES': False,
'RPN_FG_FRACTION': 0.5,
'RPN_NEGATIVE_OVERLAP': 0.3,
'RPN_NMS_THRESH': 0.7,
'RPN_POSITIVE_OVERLAP': 0.7,
'RPN_POSITIVE_WEIGHT': -1.0,
'RPN_POST_NMS_TOP_N': 2000,
'RPN_PRE_NMS_TOP_N': 12000,
'SCALES': [600],
'SNAPSHOT_ITERS': 5000,
'SNAPSHOT_KEPT': 3,
'SNAPSHOT_PREFIX': 'vgg16_faster_rcnn',
'STEPSIZE': 50000,
'SUMMARY_INTERVAL': 180,
'TRUNCATED': False,
'USE_FLIPPED': True,
'USE_GT': False,
'WEIGHT_DECAY': 0.0005},
'USE_GPU_NMS': True}
Loaded dataset `voc_2007_trainval` for training
Set proposal method: gt
Appending horizontally-flipped training examples...
voc_2007_trainval gt roidb loaded from /home/mona/computer_vision/tf-faster-rcnn/data/cache/voc_2007_trainval_gt_roidb.pkl
done
Preparing training data...
done
10022 roidb entries
Output will be saved to `/home/mona/computer_vision/tf-faster-rcnn/output/vgg16/voc_2007_trainval/default`
TensorFlow summaries will be saved to `/home/mona/computer_vision/tf-faster-rcnn/tensorboard/vgg16/voc_2007_trainval/default`
Loaded dataset `voc_2007_test` for training
Set proposal method: gt
Preparing training data...
voc_2007_test gt roidb loaded from /home/mona/computer_vision/tf-faster-rcnn/data/cache/voc_2007_test_gt_roidb.pkl
done
4952 validation roidb entries
Filtered 0 roidb entries: 10022 -> 10022
Filtered 0 roidb entries: 4952 -> 4952
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:03:00.0
Total memory: 11.92GiB
Free memory: 11.85GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0)
Solving...
Loading caffe weights...
Done!
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:91: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Loading initial model weights from data/imagenet_weights/vgg16.weights
Loaded.
iter: 20 / 70000, total loss: 0.443026
>>> rpn_loss_cls: 0.345992
>>> rpn_loss_box: 0.097034
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.749s / iter
iter: 40 / 70000, total loss: 0.516920
>>> rpn_loss_cls: 0.399234
>>> rpn_loss_box: 0.117686
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.760s / iter
iter: 60 / 70000, total loss: 0.393830
>>> rpn_loss_cls: 0.353334
>>> rpn_loss_box: 0.040496
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.668s / iter
iter: 80 / 70000, total loss: 0.217178
>>> rpn_loss_cls: 0.146591
>>> rpn_loss_box: 0.070533
>>> loss_cls: 0.000053
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.536s / iter
iter: 100 / 70000, total loss: 0.390607
>>> rpn_loss_cls: 0.277706
>>> rpn_loss_box: 0.030601
>>> loss_cls: 0.075361
>>> loss_box: 0.006940
>>> lr: 0.001000
speed: 1.495s / iter
iter: 120 / 70000, total loss: 0.882707
>>> rpn_loss_cls: 0.566185
>>> rpn_loss_box: 0.227990
>>> loss_cls: 0.083081
>>> loss_box: 0.005452
>>> lr: 0.001000
speed: 1.570s / iter
iter: 140 / 70000, total loss: 0.223789
>>> rpn_loss_cls: 0.113045
>>> rpn_loss_box: 0.049687
>>> loss_cls: 0.052417
>>> loss_box: 0.008640
>>> lr: 0.001000
speed: 1.510s / iter
iter: 160 / 70000, total loss: 0.219555
>>> rpn_loss_cls: 0.187197
>>> rpn_loss_box: 0.032358
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.494s / iter
iter: 180 / 70000, total loss: 2.256282
>>> rpn_loss_cls: 1.965876
>>> rpn_loss_box: 0.290406
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.475s / iter
iter: 200 / 70000, total loss: 1.727870
>>> rpn_loss_cls: 1.226427
>>> rpn_loss_box: 0.501443
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.463s / iter
iter: 220 / 70000, total loss: 0.353863
>>> rpn_loss_cls: 0.298823
>>> rpn_loss_box: 0.055040
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.461s / iter
iter: 240 / 70000, total loss: 0.147688
>>> rpn_loss_cls: 0.039554
>>> rpn_loss_box: 0.108122
>>> loss_cls: 0.000012
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.450s / iter
iter: 260 / 70000, total loss: 0.485889
>>> rpn_loss_cls: 0.416970
>>> rpn_loss_box: 0.068911
>>> loss_cls: 0.000009
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.428s / iter
iter: 280 / 70000, total loss: 0.153297
>>> rpn_loss_cls: 0.108915
>>> rpn_loss_box: 0.044243
>>> loss_cls: 0.000139
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.440s / iter
iter: 300 / 70000, total loss: 0.374053
>>> rpn_loss_cls: 0.310106
>>> rpn_loss_box: 0.063945
>>> loss_cls: 0.000001
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.397s / iter
iter: 320 / 70000, total loss: 1.169239
>>> rpn_loss_cls: 1.099040
>>> rpn_loss_box: 0.070199
>>> loss_cls: 0.000000
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.385s / iter
iter: 340 / 70000, total loss: 0.243177
>>> rpn_loss_cls: 0.193078
>>> rpn_loss_box: 0.049057
>>> loss_cls: 0.001042
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.370s / iter
iter: 360 / 70000, total loss: 0.387752
>>> rpn_loss_cls: 0.375503
>>> rpn_loss_box: 0.012084
>>> loss_cls: 0.000166
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.353s / iter
iter: 380 / 70000, total loss: 0.494936
>>> rpn_loss_cls: 0.312221
>>> rpn_loss_box: 0.045870
>>> loss_cls: 0.136845
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.336s / iter
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:48: RuntimeWarning: overflow encountered in exp
pred_w = np.exp(dw) * widths[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:48: RuntimeWarning: overflow encountered in multiply
pred_w = np.exp(dw) * widths[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:49: RuntimeWarning: overflow encountered in exp
pred_h = np.exp(dh) * heights[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:49: RuntimeWarning: overflow encountered in multiply
pred_h = np.exp(dh) * heights[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:55: RuntimeWarning: invalid value encountered in subtract
pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
iter: 400 / 70000, total loss: nan
>>> rpn_loss_cls: nan
>>> rpn_loss_box: nan
>>> loss_cls: 3.037189
>>> loss_box: 0.000000
>>> lr: 0.001000
speed: 1.321s / iter
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
Traceback (most recent call last):
File "./tools/trainval_vgg16_net.py", line 117, in <module>
max_iters=args.max_iters)
File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 304, in train_net
sw.train_model(sess, max_iters)
File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 197, in train_model
self.net.train_step_with_summary(sess, blobs, train_op)
File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 561, in train_step_with_summary
feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
Caused by op u'TRAIN/vgg16_default/conv3_1/weight', defined at:
File "./tools/trainval_vgg16_net.py", line 117, in <module>
max_iters=args.max_iters)
File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 304, in train_net
sw.train_model(sess, max_iters)
File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 91, in train_model
tag='default', anchor_scales=anchors)
File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 507, in create_architecture
self._add_train_summary(var)
File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 48, in _add_train_summary
tf.summary.histogram('TRAIN/' + var.op.name, var)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/summary.py", line 205, in histogram
tag=scope.rstrip('/'), values=values, name=scope)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 139, in _histogram_summary
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
[[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:652] Deallocating stream with pending work
Command exited with non-zero status 1
435.97user 110.56system 9:22.01elapsed 97%CPU (0avgtext+0avgdata 2976644maxresident)k
60224inputs+2752outputs (4major+2126190minor)pagefaults 0swaps
mona@pascal:~/computer_vision/tf-faster-rcnn$
Issue Analytics
- State:
- Created 7 years ago
- Comments:13 (3 by maintainers)
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@monajalal , @zdm123 , @amirhfarzaneh, @yidan216home , I get the same problem with train my data , the rpn_box_loss is nan, after some research, it’s because in the file ‘pascal_voc.py’, the function ‘_load_pascal_annotation’ has Make pixel indexes 0-based,the code is : x1 = float(bbox.find(‘xmin’).text) - 1 y1 = float(bbox.find(‘ymin’).text) - 1 x2 = float(bbox.find(‘xmax’).text) - 1 y2 = float(bbox.find(‘ymax’).text) - 1 but if your data is not based 1, such as my data is based 0, then it will get -1 in the data, may be you can try to delete the -1 operation,hope helpful!
you may need to adjust the hyperparameters (e.g. learning rate) if you are running on another dataset