Training Runtime Error: StopIteration
See original GitHub issueHi,
I’m using the released training data on AWS and the latest main branch to train the model.
- The directory structure of the released data is not recognized by the code.
- After re-structuring the directories and put all the .hhr and .a3m under the alignment directory, the code crashes at
File "/global/u2/b/bz186/openfold/openfold/data/data_modules.py", line 377, in reroll datapoint_idx = next(samples)
with default settings.
Any idea to solve this?
Thanks,
Bo
The full trace back is as below:
Traceback (most recent call last):
File "train_openfold.py", line 548, in <module>
main(args)
File "train_openfold.py", line 341, in main
ckpt_path=ckpt_path,
File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in fit
self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
return self._run_train()
File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1319, in _run_train
self.fit_loop.run()
File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 140, in run
self.on_run_start(*args, **kwargs)
File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 197, in on_run_start
self.trainer.reset_train_val_dataloaders(self.trainer.lightning_module)
File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py", line 595, in reset_train_val_dataloaders
self.reset_train_dataloader(model=model)
File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py", line 365, in reset_train_dataloader
self.train_dataloader = self.request_dataloader(RunningStage.TRAINING, model=model)
File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py", line 611, in request_dataloader
dataloader = source.dataloader()
File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 300, in dataloader
return method()
File "/global/u2/b/bz186/openfold/openfold/data/data_modules.py", line 694, in train_dataloader
return self._gen_dataloader("train")
File "/global/u2/b/bz186/openfold/openfold/data/data_modules.py", line 671, in _gen_dataloader
dataset.reroll()
File "/global/u2/b/bz186/openfold/openfold/data/data_modules.py", line 377, in reroll
datapoint_idx = next(samples)
StopIteration
srun: error: nid001680: task 0: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=2466693.0
Issue Analytics
- State:
- Created a year ago
- Comments:36 (19 by maintainers)
Top Results From Across the Web
generator raised StopIteration" every time I try to run app ...
So during my recent self-learning on Python, a course required me to ... This won't execute RuntimeError: generator raised StopIteration ...
Read more >RuntimeError('generator raised StopIteration') #576 - GitHub
This has been very helpful. When serving static assets, and the server wants to respond with 304 Not Modified , web.py raises StopIteration....
Read more >"RuntimeError: generator raised StopIteration" every time I try ...
PYTHON : " RuntimeError : generator raised StopIteration " every time I try to run app [ Gift : Animated Search Engine ...
Read more >PEP 479 – Change StopIteration handling inside generators
This PEP proposes a change to generators: when StopIteration is raised inside a generator, it is replaced with RuntimeError . (More precisely, this...
Read more >RuntimeError: generator raised StopIteration - Kaggle
RuntimeError : generator raised StopIteration · 1 import re ----> 2 train_texts = process_texts(list_of_simple_preprocess_data). in process_texts(texts) · 2 texts ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
The “exact sequence” warning is expected and is nothing to worry about. As for the precision thing, could we move this to #180? I’m pretty sure it’s the same thing. FP16 is not supported, so the final error is also expected.
Delete their
alignment_dirs
and rerun. I’ll look into what’s causing this.