[RFC] Change the default argument of `gc_after_trial` in `Study.optimize` to `False`.
See original GitHub issueDescription
Study.optimize
calls gc.collect()
after every objective evaluation. This change was added in #377. In #377 we added the gc invocation because ChainerMN example loads the full MNIST dataset in every objective call and it wasted too much memory. I understand that the gc invocation is useful for some users. However, the gc call can be a bottleneck in the whole optuna
optimization loop.
I’d like to request for comments about changing the default behavior of Study.optimize
to don’t invoke gc.
Pros
- We need the gc in only some special environments such as CircleCI (otherwise, python automatically invoke gc when necessary).
- Removing the gc invocation improves optimization speed.
Cons
- Some user programs might suffer from OOM in certain environments.
- The current optuna optimization loop is much faster compared to
v1.3
and further speed up might be beneficial for only a small fraction of users (Please refer to the benchmarks in the next section). It might be sufficient to add a performance tips section to the official doc.
Microbenchmark
The followings are some microbenchmark results. (I used a similar setup with #1135.)
Without GC (master)
optimization with 2000 trials / 30 params: 259.5s
optimization with 2000 trials / 2 params: 16.2s
optimization with 1000 trials / 30 params: 72.6s
optimization with 1000 trials / 2 params: 4.9s
(Remark: Performance depends on which samplers to use.)
With GC (master)
optimization with 2000 trials / 30 params: 363.7s
optimization with 2000 trials / 2 params: 105.6s
optimization with 1000 trials / 30 params: 122.8s
optimization with 1000 trials / 2 params: 48.5s
(Remark: GC performance also depends on the user program.)
Without GC (v1.3)
As for reference, I add a benchmark result on v1.3
.
optimization with 1000 trials / 30 params: 325.7s
optimization with 1000 trials / 2 params: 83.6s
(Remark: Compared to v1.3
, the current master has additional functionality in InMemoryStorage
(#1228) and performs appropriate CoW handling (#1139), which incur overheads to master.)
With GC (v1.3)
optimization with 1000 trials / 2 params: 235.6s
Issue Analytics
- State:
- Created 3 years ago
- Reactions:6
- Comments:5 (4 by maintainers)
Top GitHub Comments
I personally agree to remove the parameter like I commented https://github.com/optuna/optuna/pull/533#issuecomment-532077535. user-defined callback sounds enough for this problem.
Thanks @c-bata for you input. I noticed it might be more of a “follow-up” to what’s addressed in this issue, but in any case would like to continue that discussion as well.
Back to this issue, I agree with https://github.com/optuna/optuna/pull/533#issuecomment-534898777 that we’re stretching beyond the responsibility of the framework. While I know that it’s a controversial topic still (within the team at least), it also seems like many are for disabling this collection by default, so I created #1380.