RuntimeError: Trying to backward through the graph a second time
See original GitHub issueThanks for reporting the unexpected results and we appreciate it a lot.
Describe the Issue
GradientCumulativeOptimizerHook
doesn’t work.
Reproduction
- What command, code, or script did you run?
Add
GradientCumulativeOptimizerHook
to your*_config.py
file
custom_hooks = [
dict(type="GradientCumulativeOptimizerHook", cumulative_iters=4),
]
Output
2021-09-27 14:21:26,889 - mmdet - WARNING - GradientCumulativeOptimizerHook may slightly decrease performance if the model has BatchNorm layers.
Traceback (most recent call last):
File "/home/zuppif/integration-object-detection/playground.py", line 81, in <module>
main(Args(config_file, cfg_options=options))
File "/home/zuppif/integration-object-detection/src/train.py", line 185, in main
train_detector(
File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmdet/apis/train.py", line 174, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train
self.call_hook('after_train_iter')
File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/hooks/optimizer.py", line 115, in after_train_iter
loss.backward()
File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/torch/autograd/__init__.py", line 147, in backward
Variable._execution_engine.run_backward(
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
Environment
- Please run
python -c "from mmcv.utils import collect_env; print(collect_env())"
{'sys.platform': 'linux', 'Python': '3.9.5 (default, Jun 4 2021, 12:28:51) [GCC 7.5.0]', 'CUDA available': True, 'GPU 0,1,2': 'GeForce GTX 1080 Ti', 'CUDA_HOME': '/usr/local/cuda', 'NVCC': 'Build cuda_11.2.r11.2/compiler.29373293_0', 'GCC': 'gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0', 'PyTorch': '1.9.1+cu102', 'PyTorch compiling details': 'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 10.2\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70\n - CuDNN 7.6.5\n - Magma 2.5.2\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, \n', 'TorchVision': '0.10.1+cu102', 'OpenCV': '4.5.3', 'MMCV': '1.3.13', 'MMCV Compiler': 'GCC 9.3', 'MMCV CUDA Compiler': '11.2'}
Error traceback If applicable, paste the error traceback here.
2021-09-27 14:21:26,889 - mmdet - WARNING - GradientCumulativeOptimizerHook may slightly decrease performance if the model has BatchNorm layers.
Traceback (most recent call last):
File "/home/zuppif/integration-object-detection/playground.py", line 81, in <module>
main(Args(config_file, cfg_options=options))
File "/home/zuppif/integration-object-detection/src/train.py", line 185, in main
train_detector(
File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmdet/apis/train.py", line 174, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train
self.call_hook('after_train_iter')
File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/mmcv/runner/hooks/optimizer.py", line 115, in after_train_iter
loss.backward()
File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/zuppif/integration-object-detection/.venv/lib/python3.9/site-packages/torch/autograd/__init__.py", line 147, in backward
Variable._execution_engine.run_backward(
RuntimeError: Trying to backward through the graph a second time (or directly access saved variables after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved variables after calling backward.
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (5 by maintainers)
Top Results From Across the Web
RuntimeError: Trying to backward through the graph a second ...
I keep running into this error: RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed....
Read more >Pytorch - RuntimeError: Trying to backward through the graph ...
backward () is trying to back-propagate all the way through to the start of time, which works for the first batch but not...
Read more >Trying to backward through the graph a second time ... - GitHub
RuntimeError : Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed).
Read more >Trying to backward through the graph a second time, but the ...
I keep running into this error: RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed....
Read more >understanding backward() in Pytorch-checkpoint
When training a model, the graph will be re-generated for each iteration. Therefore each iteration will consume the graph if the retain_graph is...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Great. It seems to be working now. Thank you so much for your support!
Super! Thank you. @zhouzaida would it make sense to add an example in the doc to avoid this confusion?