question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

InvalidArgumentError (see above for traceback): Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight

See original GitHub issue

I followed your instruction and got this error. Can you please suggest solutions?

mona@pascal:~/computer_vision/tf-faster-rcnn$ GPU_ID=0
mona@pascal:~/computer_vision/tf-faster-rcnn$ ./experiments/scripts/vgg16.sh $GPU_ID pascal_voc
+ set -e
+ export PYTHONUNBUFFERED=True
+ PYTHONUNBUFFERED=True
+ GPU_ID=0
+ DATASET=pascal_voc
+ array=($@)
+ len=2
+ EXTRA_ARGS=
+ EXTRA_ARGS_SLUG=
+ case ${DATASET} in
+ TRAIN_IMDB=voc_2007_trainval
+ TEST_IMDB=voc_2007_test
+ STEPSIZE=50000
+ ITERS=70000
++ date +%Y-%m-%d_%H-%M-%S
+ LOG=experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
+ exec
++ tee -a experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
tee: experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43: No such file or directory
+ echo Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
Logging output to experiments/logs/vgg16_voc_2007_trainval__vgg16.txt.2017-02-14_22-08-43
+ set +x
+ '[' '!' -f output/vgg16/voc_2007_trainval/default/vgg16_faster_rcnn_iter_70000.ckpt.index ']'
+ [[ ! -z '' ]]
+ CUDA_VISIBLE_DEVICES=0
+ time python ./tools/trainval_vgg16_net.py --weight data/imagenet_weights/vgg16.weights --imdb voc_2007_trainval --imdbval voc_2007_test --iters 70000 --cfg experiments/cfgs/vgg16.yml --set TRAIN.STEPSIZE 50000
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
Called with args:
Namespace(cfg_file='experiments/cfgs/vgg16.yml', imdb_name='voc_2007_trainval', imdbval_name='voc_2007_test', max_iters=70000, set_cfgs=['TRAIN.STEPSIZE', '50000'], tag=None, weight='data/imagenet_weights/vgg16.weights')
Using config:
{'DATA_DIR': '/home/mona/computer_vision/tf-faster-rcnn/data',
 'DEDUP_BOXES': 0.0625,
 'EPS': 1e-14,
 'EXP_DIR': 'vgg16',
 'GPU_ID': 0,
 'MATLAB': 'matlab',
 'PIXEL_MEANS': array([[[ 102.9801,  115.9465,  122.7717]]]),
 'POOLING_MODE': 'crop',
 'RNG_SEED': 3,
 'ROOT_DIR': '/home/mona/computer_vision/tf-faster-rcnn',
 'TEST': {'BBOX_REG': True,
          'HAS_RPN': True,
          'MAX_SIZE': 1000,
          'MODE': 'nms',
          'NMS': 0.3,
          'PROPOSAL_METHOD': 'selective_search',
          'RPN_NMS_THRESH': 0.7,
          'RPN_POST_NMS_TOP_N': 300,
          'RPN_PRE_NMS_TOP_N': 6000,
          'RPN_TOP_N': 5000,
          'SCALES': [600],
          'SVM': False},
 'TRAIN': {'ASPECT_GROUPING': False,
           'BATCH_SIZE': 256,
           'BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'BBOX_NORMALIZE_MEANS': [0.0, 0.0, 0.0, 0.0],
           'BBOX_NORMALIZE_STDS': [0.1, 0.1, 0.2, 0.2],
           'BBOX_NORMALIZE_TARGETS': True,
           'BBOX_NORMALIZE_TARGETS_PRECOMPUTED': True,
           'BBOX_REG': True,
           'BBOX_THRESH': 0.5,
           'BG_THRESH_HI': 0.5,
           'BG_THRESH_LO': 0.0,
           'BIAS_DECAY': False,
           'DISPLAY': 20,
           'DOUBLE_BIAS': True,
           'FG_FRACTION': 0.25,
           'FG_THRESH': 0.5,
           'GAMMA': 0.1,
           'HAS_RPN': True,
           'IMS_PER_BATCH': 1,
           'LEARNING_RATE': 0.001,
           'MAX_SIZE': 1000,
           'MOMENTUM': 0.9,
           'PROPOSAL_METHOD': 'gt',
           'RPN_BATCHSIZE': 256,
           'RPN_BBOX_INSIDE_WEIGHTS': [1.0, 1.0, 1.0, 1.0],
           'RPN_CLOBBER_POSITIVES': False,
           'RPN_FG_FRACTION': 0.5,
           'RPN_NEGATIVE_OVERLAP': 0.3,
           'RPN_NMS_THRESH': 0.7,
           'RPN_POSITIVE_OVERLAP': 0.7,
           'RPN_POSITIVE_WEIGHT': -1.0,
           'RPN_POST_NMS_TOP_N': 2000,
           'RPN_PRE_NMS_TOP_N': 12000,
           'SCALES': [600],
           'SNAPSHOT_ITERS': 5000,
           'SNAPSHOT_KEPT': 3,
           'SNAPSHOT_PREFIX': 'vgg16_faster_rcnn',
           'STEPSIZE': 50000,
           'SUMMARY_INTERVAL': 180,
           'TRUNCATED': False,
           'USE_FLIPPED': True,
           'USE_GT': False,
           'WEIGHT_DECAY': 0.0005},
 'USE_GPU_NMS': True}
Loaded dataset `voc_2007_trainval` for training
Set proposal method: gt
Appending horizontally-flipped training examples...
voc_2007_trainval gt roidb loaded from /home/mona/computer_vision/tf-faster-rcnn/data/cache/voc_2007_trainval_gt_roidb.pkl
done
Preparing training data...
done
10022 roidb entries
Output will be saved to `/home/mona/computer_vision/tf-faster-rcnn/output/vgg16/voc_2007_trainval/default`
TensorFlow summaries will be saved to `/home/mona/computer_vision/tf-faster-rcnn/tensorboard/vgg16/voc_2007_trainval/default`
Loaded dataset `voc_2007_test` for training
Set proposal method: gt
Preparing training data...
voc_2007_test gt roidb loaded from /home/mona/computer_vision/tf-faster-rcnn/data/cache/voc_2007_test_gt_roidb.pkl
done
4952 validation roidb entries
Filtered 0 roidb entries: 10022 -> 10022
Filtered 0 roidb entries: 4952 -> 4952
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:03:00.0
Total memory: 11.92GiB
Free memory: 11.85GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0)
Solving...
Loading caffe weights...
Done!
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:91: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
Loading initial model weights from data/imagenet_weights/vgg16.weights
Loaded.
iter: 20 / 70000, total loss: 0.443026
 >>> rpn_loss_cls: 0.345992
 >>> rpn_loss_box: 0.097034
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.749s / iter
iter: 40 / 70000, total loss: 0.516920
 >>> rpn_loss_cls: 0.399234
 >>> rpn_loss_box: 0.117686
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.760s / iter
iter: 60 / 70000, total loss: 0.393830
 >>> rpn_loss_cls: 0.353334
 >>> rpn_loss_box: 0.040496
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.668s / iter
iter: 80 / 70000, total loss: 0.217178
 >>> rpn_loss_cls: 0.146591
 >>> rpn_loss_box: 0.070533
 >>> loss_cls: 0.000053
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.536s / iter
iter: 100 / 70000, total loss: 0.390607
 >>> rpn_loss_cls: 0.277706
 >>> rpn_loss_box: 0.030601
 >>> loss_cls: 0.075361
 >>> loss_box: 0.006940
 >>> lr: 0.001000
speed: 1.495s / iter
iter: 120 / 70000, total loss: 0.882707
 >>> rpn_loss_cls: 0.566185
 >>> rpn_loss_box: 0.227990
 >>> loss_cls: 0.083081
 >>> loss_box: 0.005452
 >>> lr: 0.001000
speed: 1.570s / iter
iter: 140 / 70000, total loss: 0.223789
 >>> rpn_loss_cls: 0.113045
 >>> rpn_loss_box: 0.049687
 >>> loss_cls: 0.052417
 >>> loss_box: 0.008640
 >>> lr: 0.001000
speed: 1.510s / iter
iter: 160 / 70000, total loss: 0.219555
 >>> rpn_loss_cls: 0.187197
 >>> rpn_loss_box: 0.032358
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.494s / iter
iter: 180 / 70000, total loss: 2.256282
 >>> rpn_loss_cls: 1.965876
 >>> rpn_loss_box: 0.290406
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.475s / iter
iter: 200 / 70000, total loss: 1.727870
 >>> rpn_loss_cls: 1.226427
 >>> rpn_loss_box: 0.501443
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.463s / iter
iter: 220 / 70000, total loss: 0.353863
 >>> rpn_loss_cls: 0.298823
 >>> rpn_loss_box: 0.055040
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.461s / iter
iter: 240 / 70000, total loss: 0.147688
 >>> rpn_loss_cls: 0.039554
 >>> rpn_loss_box: 0.108122
 >>> loss_cls: 0.000012
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.450s / iter
iter: 260 / 70000, total loss: 0.485889
 >>> rpn_loss_cls: 0.416970
 >>> rpn_loss_box: 0.068911
 >>> loss_cls: 0.000009
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.428s / iter
iter: 280 / 70000, total loss: 0.153297
 >>> rpn_loss_cls: 0.108915
 >>> rpn_loss_box: 0.044243
 >>> loss_cls: 0.000139
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.440s / iter
iter: 300 / 70000, total loss: 0.374053
 >>> rpn_loss_cls: 0.310106
 >>> rpn_loss_box: 0.063945
 >>> loss_cls: 0.000001
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.397s / iter
iter: 320 / 70000, total loss: 1.169239
 >>> rpn_loss_cls: 1.099040
 >>> rpn_loss_box: 0.070199
 >>> loss_cls: 0.000000
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.385s / iter
iter: 340 / 70000, total loss: 0.243177
 >>> rpn_loss_cls: 0.193078
 >>> rpn_loss_box: 0.049057
 >>> loss_cls: 0.001042
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.370s / iter
iter: 360 / 70000, total loss: 0.387752
 >>> rpn_loss_cls: 0.375503
 >>> rpn_loss_box: 0.012084
 >>> loss_cls: 0.000166
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.353s / iter
iter: 380 / 70000, total loss: 0.494936
 >>> rpn_loss_cls: 0.312221
 >>> rpn_loss_box: 0.045870
 >>> loss_cls: 0.136845
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.336s / iter
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:48: RuntimeWarning: overflow encountered in exp
  pred_w = np.exp(dw) * widths[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:48: RuntimeWarning: overflow encountered in multiply
  pred_w = np.exp(dw) * widths[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:49: RuntimeWarning: overflow encountered in exp
  pred_h = np.exp(dh) * heights[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:49: RuntimeWarning: overflow encountered in multiply
  pred_h = np.exp(dh) * heights[:, np.newaxis]
/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/bbox_transform.py:55: RuntimeWarning: invalid value encountered in subtract
  pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
iter: 400 / 70000, total loss: nan
 >>> rpn_loss_cls: nan
 >>> rpn_loss_box: nan
 >>> loss_cls: 3.037189
 >>> loss_box: 0.000000
 >>> lr: 0.001000
speed: 1.321s / iter
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
W tensorflow/core/framework/op_kernel.cc:975] Invalid argument: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]
Traceback (most recent call last):
  File "./tools/trainval_vgg16_net.py", line 117, in <module>
    max_iters=args.max_iters)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 304, in train_net
    sw.train_model(sess, max_iters)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 197, in train_model
    self.net.train_step_with_summary(sess, blobs, train_op)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 561, in train_step_with_summary
    feed_dict=feed_dict)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]

Caused by op u'TRAIN/vgg16_default/conv3_1/weight', defined at:
  File "./tools/trainval_vgg16_net.py", line 117, in <module>
    max_iters=args.max_iters)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 304, in train_net
    sw.train_model(sess, max_iters)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/model/train_val.py", line 91, in train_model
    tag='default', anchor_scales=anchors)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 507, in create_architecture
    self._add_train_summary(var)
  File "/home/mona/computer_vision/tf-faster-rcnn/tools/../lib/nets/vgg16.py", line 48, in _add_train_summary
    tf.summary.histogram('TRAIN/' + var.op.name, var)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/summary/summary.py", line 205, in histogram
    tag=scope.rstrip('/'), values=values, name=scope)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_logging_ops.py", line 139, in _histogram_summary
    name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Nan in summary histogram for: TRAIN/vgg16_default/conv3_1/weight
	 [[Node: TRAIN/vgg16_default/conv3_1/weight = HistogramSummary[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](TRAIN/vgg16_default/conv3_1/weight/tag, vgg16_default/conv3_1/weight/read/_269)]]

E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:652] Deallocating stream with pending work
Command exited with non-zero status 1
435.97user 110.56system 9:22.01elapsed 97%CPU (0avgtext+0avgdata 2976644maxresident)k
60224inputs+2752outputs (4major+2126190minor)pagefaults 0swaps
mona@pascal:~/computer_vision/tf-faster-rcnn$ 

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:13 (3 by maintainers)

github_iconTop GitHub Comments

9reactions
lonlonagocommented, Oct 23, 2017

@monajalal , @zdm123 , @amirhfarzaneh, @yidan216home , I get the same problem with train my data , the rpn_box_loss is nan, after some research, it’s because in the file ‘pascal_voc.py’, the function ‘_load_pascal_annotation’ has Make pixel indexes 0-based,the code is : x1 = float(bbox.find(‘xmin’).text) - 1 y1 = float(bbox.find(‘ymin’).text) - 1 x2 = float(bbox.find(‘xmax’).text) - 1 y2 = float(bbox.find(‘ymax’).text) - 1 but if your data is not based 1, such as my data is based 0, then it will get -1 in the data, may be you can try to delete the -1 operation,hope helpful!

1reaction
endernewtoncommented, Oct 24, 2017

you may need to adjust the hyperparameters (e.g. learning rate) if you are running on another dataset

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found