More efficient optuna.study.Study.trials_dataframe
See original GitHub issueoptuna.study.Study.trials_dataframe
(and others like it) currently take a very long time to run as the number of trials increase into the thousands. This request is for this method to run significantly faster.
Motivation
I often want to analyze and visualize the results of a study. This method (or others like it) is the first step in getting the data necessary. It would be helpful if this method could run significantly faster so that a person analyzing results doesn’t have to wait many minutes to see results. I just benchmarked one of my studies and it took 13 minutes to load the dataframe. It only has thousands of trials in it.
Description
I think, though I’m not 100% sure, that this feature requires changes to the rdb backend so that loading all trials doesn’t result in O(trials)
queries. It should be possible to fetch all of the data with a O(1)
queries.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:6 (5 by maintainers)
For whoever may pick this up in the future, or for people trying to determine if optuna is a good fit for their task, it turns out this method actually takes
O(trials_in_database)
notO(trials_in_study)
. There is probably an index that can be added to at least help with that.Let me close this issue as fixed. Feel free to reopen as needed.