Cannot change attributes of finished trial
See original GitHub issueHi,
I really like optuna ( 2.10), thanks for this great tool 😃
However, I get many failed trials with the following error message:
Traceback (most recent call last):
File "/u/twagner/conda-envs/tomotwin_opt/lib/python3.9/site-packages/optuna/study/_optimize.py", line 213, in _run_trial
value_or_values = func(trial)
File "/u/twagner/conda-envs/tomotwin_opt/lib/python3.9/site-packages/tomotwin/train_optuna.py", line 157, in objective
trial.report(val_loss, epoch)
File "/u/twagner/conda-envs/tomotwin_opt/lib/python3.9/site-packages/optuna/trial/_trial.py", line 597, in report
self.storage.set_trial_intermediate_value(self._trial_id, step, value)
File "/u/twagner/conda-envs/tomotwin_opt/lib/python3.9/site-packages/optuna/storages/_cached_storage.py", line 318, in set_trial_intermediate_value
self._flush_trial(trial_id)
File "/u/twagner/conda-envs/tomotwin_opt/lib/python3.9/site-packages/optuna/storages/_cached_storage.py", line 428, in _flush_trial
return self._backend._update_trial(
File "/u/twagner/conda-envs/tomotwin_opt/lib/python3.9/site-packages/optuna/storages/_rdb/storage.py", line 671, in _update_trial
raise RuntimeError("Cannot change attributes of finished trial.")
RuntimeError: Cannot change attributes of finished trial.
Here are some statistics about the study:
Number of finished trials: 67
Pruned: 22
Completed: 12
Failed: 22
Waiting: 0
Running: 11
For another study that runs already much longer it is even worse:
Number of finished trials: 309
Pruned: 45
Completed: 7
Failed: 251
Waiting: 0
Running: 6
BTW: Why are running trials listed as finished trials?
Here is how I setup the study: https://gist.github.com/thorstenwagner/bde99f26295809882ab3315ad8be0b5b
And this is my objective: https://gist.github.com/thorstenwagner/5e39db92b0198021ee194fcf42730ae3
Can someone tell me what is wrong in my setup?
The whole experiment runs on a HPC with 11 processes in parallel. The file system is of type GPFS (not NFS). Locking should be supported without flaws.
Issue Analytics
- State:
- Created 2 years ago
- Comments:14 (6 by maintainers)
Top Results From Across the Web
optuna/optuna - Gitter
I have a question about parameter importances. Are these relative to the percentage change in the parameter value, or the percentage of the...
Read more >Source code for optuna.storages._rdb.storage - Read the Docs
[docs]class RDBStorage(BaseStorage, BaseHeartbeat): """Storage class for RDB backend. Note that library users can instantiate this class, but the attributes ...
Read more >TrialHandler attributes are not updating - Online experiments
In regards to your first point, I am going to try trials.TrialList = 0 and certainly changing the name of the loop in...
Read more >Enter attributes for new features—ArcGIS Pro | Documentation
Enter attribute values. To enter attribute values for new features, complete the following steps: On the ribbon, click the Edit ...
Read more >Known issues and workarounds (Dynamics 365 Marketing)
You can't sign up using an @microsoft.com email address. If you're a Microsoft employee and would like to sign up for a trial,...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Just got the error again locally. With heartbeat 60 😦
@thorstenwagner
Thank you for explaining, I think I understand your situation.
Hmm, I see. I didn’t reproduce the problem with a simple sample… If you find the way to reproduce, please tell me. It would help us a lot.
OK. I try to explain it. In some situation, processes running trials are killed suddenly. Typical cases are
spot instanceon AWS andpreemptive instanceon GCP. If processes are killed in such ways, trials whose states arerunningleave in the study. Heartbeat will check if each process is alive by sending a ping and finish a trial if the process doesn’t respond to the signal by changing the state of the process fromrunningtofail.🔗
Heartbeatin https://github.com/optuna/optuna/releases/tag/v2.5.0. 🔗optuna.storages.RetryFailedTrialCallback Addedin https://github.com/optuna/optuna/releases/tag/v2.8.0If you don’t have to pay attention to the process interruption, you can simply use cached storage by the following code
instead of