question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

error while using "gmi" for the loss

See original GitHub issue

Subject of the issue

getting tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: local_net/down_sample_resnet_block/conv3d_block/conv3d/conv3d_1/kernel_0 [Op:WriteHistogramSummary] while trying to use ‘gmi’ in several scenarios (e.g. in the demos)

If the bug is confirmed, would you be willing to submit a PR? (Help can be provided if you need assistance submitting a PR)

No

Your environment

  • DeepReg version (commit hash or tag): 0.1.0b1 (from git rev-parse HEAD: 8b8d75fdaaf89be2dfefc1d5c3c37e3ef26fd7d1)

  • OS: Linux 4.15.0-112-generic #113-Ubuntu x86_64 x86_64 x86_64 GNU/Linux

  • Python Version: 3.7.9

  • TensorFlow: 2.2.0

Steps to reproduce

modified the grouped_mr_heart demo yaml file with ‘gmi’ instead of ‘lncc’ and then run deepreg_train --gpu "3" --config_path demos/grouped_mr_heart/grouped_mr_heart.yaml --log_dir grouped_mr_heart

log

1/9 [==>...........................] - ETA: 0s - loss/weighted_regularization: 0.0000e+00 - loss: nan - loss/weighted_image_dissimilarity: nan - loss/regularization: 0.0000e+00 - loss/image_dissimilarity: nan2020-10-15 08:42:22.326944: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1430] function cupti_interface_->EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)failed with error CUPTI_ERROR_INVALID_PARAMETER
2020-10-15 08:42:22.330619: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:216]  GpuTracer has collected 0 callback api events and 0 activity events.
2020-10-15 08:42:22.349700: I tensorflow/core/profiler/rpc/client/save_profile.cc:168] Creating directory: logs/grouped_mr_heart/train/plugins/profile/2020_10_15_08_42_22
2020-10-15 08:42:22.352329: I tensorflow/core/profiler/rpc/client/save_profile.cc:174] Dumped gzipped tool data for trace.json.gz to logs/grouped_mr_heart/train/plugins/profile/2020_10_15_08_42_22/MMIV-DGX-Station2.trace.json.gz
2020-10-15 08:42:22.353773: I tensorflow/core/profiler/utils/event_span.cc:288] Generation of step-events took 0.001 ms

2020-10-15 08:42:22.355437: I tensorflow/python/profiler/internal/profiler_wrapper.cc:87] Creating directory: logs/grouped_mr_heart/train/plugins/profile/2020_10_15_08_42_22Dumped tool data for overview_page.pb to logs/grouped_mr_heart/train/plugins/profile/2020_10_15_08_42_22/MMIV-DGX-Station2.overview_page.pb
Dumped tool data for input_pipeline.pb to logs/grouped_mr_heart/train/plugins/profile/2020_10_15_08_42_22/MMIV-DGX-Station2.input_pipeline.pb
Dumped tool data for tensorflow_stats.pb to logs/grouped_mr_heart/train/plugins/profile/2020_10_15_08_42_22/MMIV-DGX-Station2.tensorflow_stats.pb
Dumped tool data for kernel_stats.pb to logs/grouped_mr_heart/train/plugins/profile/2020_10_15_08_42_22/MMIV-DGX-Station2.kernel_stats.pb

2/9 [=====>........................] - ETA: 2s - loss/weighted_regularization: nan - loss: nan - loss/weighted_image_dissimilar3/9 [=========>....................] - ETA: 3s - loss/weighted_regularization: nan - loss: nan - loss/weighted_image_dissimilar4/9 [============>.................] - ETA: 3s - loss/weighted_regularization: nan - loss: nan - loss/weighted_image_dissimilar5/9 [===============>..............] - ETA: 2s - loss/weighted_regularization: nan - loss: nan - loss/weighted_image_dissimilar6/9 [===================>..........] - ETA: 2s - loss/weighted_regularization: nan - loss: nan - loss/weighted_image_dissimilar7/9 [======================>.......] - ETA: 1s - loss/weighted_regularization: nan - loss: nan - loss/weighted_image_dissimilar8/9 [=========================>....] - ETA: 0s - loss/weighted_regularization: nan - loss: nan - loss/weighted_image_dissimilar9/9 [==============================] - ETA: 0s - loss/weighted_regularization: nan - loss: nan - loss/weighted_image_dissimilarity: nan - loss/regularization: nan - loss/image_dissimilarity: nan2020-10-15 08:42:34.992438: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at summary_kernels.cc:242 : Invalid argument: Nan in summary histogram for: local_net/down_sample_resnet_block/conv3d_block/conv3d/conv3d_1/kernel_0
Traceback (most recent call last):
  File "/home/charlie/anaconda3/envs/deepreg/lib/python3.7/site-packages/tensorflow/python/ops/gen_summary_ops.py", line 464, in write_histogram_summary
    tld.op_callbacks, writer, step, tag, values)
tensorflow.python.eager.core._FallbackException: This function does not handle the case of the path where all inputs are not already EagerTensors.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/charlie/anaconda3/envs/deepreg/bin/deepreg_train", line 33, in <module>
    sys.exit(load_entry_point('deepreg', 'console_scripts', 'deepreg_train')())
  File "/home/charlie/3DREG-tests/DeepReg/deepreg/train.py", line 227, in main
    log_dir=args.log_dir,
  File "/home/charlie/3DREG-tests/DeepReg/deepreg/train.py", line 154, in train
    callbacks=callbacks,
  File "/home/charlie/anaconda3/envs/deepreg/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 66, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/home/charlie/anaconda3/envs/deepreg/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 876, in fit
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/home/charlie/anaconda3/envs/deepreg/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py", line 365, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "/home/charlie/anaconda3/envs/deepreg/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py", line 2000, in on_epoch_end
    self._log_weights(epoch)
  File "/home/charlie/anaconda3/envs/deepreg/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py", line 2119, in _log_weights
    summary_ops_v2.histogram(weight_name, weight, step=epoch)
  File "/home/charlie/anaconda3/envs/deepreg/lib/python3.7/site-packages/tensorflow/python/ops/summary_ops_v2.py", line 830, in histogram
    return summary_writer_function(name, tensor, function, family=family)
  File "/home/charlie/anaconda3/envs/deepreg/lib/python3.7/site-packages/tensorflow/python/ops/summary_ops_v2.py", line 759, in summary_writer_function
    should_record_summaries(), record, _nothing, name="")
  File "/home/charlie/anaconda3/envs/deepreg/lib/python3.7/site-packages/tensorflow/python/framework/smart_cond.py", line 54, in smart_cond
    return true_fn()
  File "/home/charlie/anaconda3/envs/deepreg/lib/python3.7/site-packages/tensorflow/python/ops/summary_ops_v2.py", line 752, in record
    with ops.control_dependencies([function(tag, scope)]):
  File "/home/charlie/anaconda3/envs/deepreg/lib/python3.7/site-packages/tensorflow/python/ops/summary_ops_v2.py", line 828, in function
    name=scope)
  File "/home/charlie/anaconda3/envs/deepreg/lib/python3.7/site-packages/tensorflow/python/ops/gen_summary_ops.py", line 469, in write_histogram_summary
    writer, step, tag, values, name=name, ctx=_ctx)
  File "/home/charlie/anaconda3/envs/deepreg/lib/python3.7/site-packages/tensorflow/python/ops/gen_summary_ops.py", line 490, in write_histogram_summary_eager_fallback
    attrs=_attrs, ctx=ctx, name=name)
  File "/home/charlie/anaconda3/envs/deepreg/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Nan in summary histogram for: local_net/down_sample_resnet_block/conv3d_block/conv3d/conv3d_1/kernel_0 [Op:WriteHistogramSummary]

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:17 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
mathpluscodecommented, Oct 25, 2020

Hi @ciphercharly the fix has been integrated into the main branch now, feel free to test again 😉 Please reopen this ticket if there’s still error!

1reaction
ciphercharlycommented, Oct 17, 2020

tested quickly, seems to run without errors with custom model/data too 👍

Read more comments on GitHub >

github_iconTop Results From Across the Web

Stratospheric chemistry - Geos-chem
According to my log file from a simulation using the GMI prod/loss rates the stratosphere is a source of 6000 Tg HCOOH per...
Read more >
64171 PS200 User Manual - Teledyne Gas & Flame Detection
WARNING: ALWAYS SWITCH THE MONITOR ON IN FRESH AIR. FAILURE TO HEED THIS WARNING COULD RESULT IN SERIOUS INJURY. OR DEATH. NOTE: The...
Read more >
A Guide to HMDA Reporting: Getting It Right! - FFIEC
Foreword. A Guide to HMDA Reporting: Getting It Right! will assist you in complying with the. Home Mortgage Disclosure Act (HMDA) as ...
Read more >
Graded Motor Imagery (GMI) Therapy - ComplexTruths
CRPS Treatment Graded Motor Imagery GMI Therapy for Those with CRPS Complex ... becomes life or death to the person with CRPS to...
Read more >
The Protective Role of GMI, an Immunomodulatory Protein ...
However, there was little influence on cell death when SCC9 cells were treated with different doses of GMI alone. As for the results...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found