Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

More efficient optuna.study.Study.trials_dataframe

See original GitHub issue

optuna.study.Study.trials_dataframe (and others like it) currently take a very long time to run as the number of trials increase into the thousands. This request is for this method to run significantly faster.

Motivation

I often want to analyze and visualize the results of a study. This method (or others like it) is the first step in getting the data necessary. It would be helpful if this method could run significantly faster so that a person analyzing results doesn’t have to wait many minutes to see results. I just benchmarked one of my studies and it took 13 minutes to load the dataframe. It only has thousands of trials in it.

Description

I think, though I’m not 100% sure, that this feature requires changes to the rdb backend so that loading all trials doesn’t result in O(trials) queries. It should be possible to fetch all of the data with a O(1) queries.

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:6 (5 by maintainers)

Top GitHub Comments

1reaction

dwielcommented, Dec 19, 2019

For whoever may pick this up in the future, or for people trying to determine if optuna is a good fit for their task, it turns out this method actually takes O(trials_in_database) not O(trials_in_study). There is probably an index that can be added to at least help with that.

0reactions

hvycommented, Jun 12, 2020

Let me close this issue as fixed. Feel free to reopen as needed.