question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training Runtime Error: StopIteration

See original GitHub issue

Hi,

I’m using the released training data on AWS and the latest main branch to train the model.

  1. The directory structure of the released data is not recognized by the code.
  2. After re-structuring the directories and put all the .hhr and .a3m under the alignment directory, the code crashes at File "/global/u2/b/bz186/openfold/openfold/data/data_modules.py", line 377, in reroll datapoint_idx = next(samples) with default settings.

Any idea to solve this?

Thanks,

Bo

The full trace back is as below:

Traceback (most recent call last):
  File "train_openfold.py", line 548, in <module>
    main(args)
  File "train_openfold.py", line 341, in main
    ckpt_path=ckpt_path,
  File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 741, in fit
    self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
  File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
    self._dispatch()
  File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
    self.training_type_plugin.start_training(self)
  File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
    self._results = trainer.run_stage()
  File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
    return self._run_train()
  File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1319, in _run_train
    self.fit_loop.run()
  File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 140, in run
    self.on_run_start(*args, **kwargs)
  File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 197, in on_run_start
    self.trainer.reset_train_val_dataloaders(self.trainer.lightning_module)
  File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py", line 595, in reset_train_val_dataloaders
    self.reset_train_dataloader(model=model)
  File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py", line 365, in reset_train_dataloader
    self.train_dataloader = self.request_dataloader(RunningStage.TRAINING, model=model)
  File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py", line 611, in request_dataloader
    dataloader = source.dataloader()
  File "/global/homes/b/bz186/openfold/lib/conda/envs/openfold_venv/lib/python3.7/site-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 300, in dataloader
    return method()
  File "/global/u2/b/bz186/openfold/openfold/data/data_modules.py", line 694, in train_dataloader
    return self._gen_dataloader("train") 
  File "/global/u2/b/bz186/openfold/openfold/data/data_modules.py", line 671, in _gen_dataloader
    dataset.reroll()
  File "/global/u2/b/bz186/openfold/openfold/data/data_modules.py", line 377, in reroll
    datapoint_idx = next(samples)
StopIteration
srun: error: nid001680: task 0: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=2466693.0

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:36 (19 by maintainers)

github_iconTop GitHub Comments

1reaction
gahdritzcommented, Jul 29, 2022

The “exact sequence” warning is expected and is nothing to worry about. As for the precision thing, could we move this to #180? I’m pretty sure it’s the same thing. FP16 is not supported, so the final error is also expected.

1reaction
gahdritzcommented, Jul 28, 2022

Delete their alignment_dirs and rerun. I’ll look into what’s causing this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

generator raised StopIteration" every time I try to run app ...
So during my recent self-learning on Python, a course required me to ... This won't execute RuntimeError: generator raised StopIteration ...
Read more >
RuntimeError('generator raised StopIteration') #576 - GitHub
This has been very helpful. When serving static assets, and the server wants to respond with 304 Not Modified , web.py raises StopIteration....
Read more >
"RuntimeError: generator raised StopIteration" every time I try ...
PYTHON : " RuntimeError : generator raised StopIteration " every time I try to run app [ Gift : Animated Search Engine ...
Read more >
PEP 479 – Change StopIteration handling inside generators
This PEP proposes a change to generators: when StopIteration is raised inside a generator, it is replaced with RuntimeError . (More precisely, this...
Read more >
RuntimeError: generator raised StopIteration - Kaggle
RuntimeError : generator raised StopIteration · 1 import re ----> 2 train_texts = process_texts(list_of_simple_preprocess_data). in process_texts(texts) · 2 texts ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found