ControlFlowCallback error in DDP
See original GitHub issue🐛 Bug Report
ControlFlowCallback can’t be pickled because of lambdas in def _filter_fn_from_loaders
.
It works fine when callbacks are initialized in def get_callbacks
, but fails if callbacks are passed directly to SupervisedRunner.train method.
File "/usr/local/lib/python3.6/dist-packages/catalyst/runners/runner.py", line 515, in train
self.run()
File "/usr/local/lib/python3.6/dist-packages/catalyst/core/runner.py", line 854, in run
self._run_event("on_exception")
File "/usr/local/lib/python3.6/dist-packages/catalyst/core/runner.py", line 788, in _run_event
getattr(self, event)(self)
File "/usr/local/lib/python3.6/dist-packages/catalyst/core/runner.py", line 780, in on_exception
raise self.exception
File "/usr/local/lib/python3.6/dist-packages/catalyst/core/runner.py", line 850, in run
self._run_experiment()
File "/usr/local/lib/python3.6/dist-packages/catalyst/core/runner.py", line 840, in _run_experiment
self.engine.spawn(self._run_stage)
File "/usr/local/lib/python3.6/dist-packages/catalyst/engines/torch.py", line 460, in spawn
fn, args=(self._world_size,), nprocs=self._world_size, join=True
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py", line 179, in start_processes
process.start()
File "/usr/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/usr/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object '_filter_fn_from_loaders.<locals>.<lambda>
Environment
Collecting environment information...
Catalyst version: 21.09
PyTorch version: 1.9.1+cu102
Is debug build: No
CUDA used to build PyTorch: 10.2
TensorFlow version: N/A
TensorBoard version: 2.6.0
OS: linux
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
CMake version: Could not collect
Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
Nvidia driver version: 455.45.01
cuDNN version: Could not collect
Versions of relevant libraries:
[pip3] catalyst==21.9
[pip3] numpy==1.19.5
[pip3] tensorboard==2.6.0
[pip3] tensorboard-data-server==0.6.1
[pip3] tensorboard-plugin-wit==1.8.0
[pip3] tensorboardX==2.2
[pip3] torch==1.9.1
[pip3] torchvision==0.10.1
[conda] Could not collect
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (5 by maintainers)
Top Results From Across the Web
DDP with Hydra multirun doesn't work when dirpath ... - GitHub
Bug Running DDP with Hydra multirun ends up with "Killed" error ... has flow control causing later iterations to have unused parameters.
Read more >RFC 5042 - Direct Data Placement Protocol (DDP) / Remote ...
For example, a callback function may be viewed simply as a very short queue. ... RNIC had flow control on generation of CQ...
Read more >"Lightning out App error in callback function" when launching ...
When launching a flow using an URL button from a list view, I get the following error at the bottom of the screen...
Read more >LightningModule - PyTorch Lightning - Read the Docs
To prevent an OOM error, it is possible to use BasePredictionWriter callback to write the predictions to disk or database after each batch...
Read more >ddp_find_unused_parameters_f...
Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@asteyo could you please help with an issue? I think some refactoring of
filtering-fns
fromto
should solve the issue 🚀
Yes, I made callable and it is fine.