Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

sqlite3.OperationalError: database is locked

See original GitHub issue

I’m deploying optuna on a single machine through a script that looks like this:

optuna create-study […]

for i in $(seq "$1")
do
    optuna study optimize […] &
done

wait

# (copy trials.db from scratch to shared storage)
[…]

I got some sqlite3.OperationalError: database is locked errors on start (on 7 processes out of 32).

I think that launching all processes at the same time causes them to make requests at the same time (collide). I’m lucky that my hyperparameters also control the computation time so that the processes have very low probability to collide after starting, but I guess this could also be a source of problems in other workflows that are massively parallel.

I believe this could be fixed by increasing the timeout of the sqlite3 backend. There should be an option to do that from the command line and the API of optuna.

Alternatives

I added a sleep in my loop.

for i in $(seq "$1")
do
    optuna study optimize […] &
    sleep 5
done

Issue Analytics

State:
Created 4 years ago
Reactions:6
Comments:16 (3 by maintainers)

Top GitHub Comments

2reactions

hvycommented, Nov 28, 2021

Let me close this issue since it hasn’t been observed as frequently as before, that is within use cases suitable for SQLite such as not-too-high level of concurrency or with NFS where file locking is non-trivial (this may have been a result of https://github.com/optuna/optuna/pull/1628). If that’d not be the case, please feel free to reopen or comment if this is still an issue. For instance, if the timeout is a common pitfall, then we could possibly consider setting a large timeout by default, similar to how we set pool_pre_ping for MySQL RDBs in https://github.com/optuna/optuna/blob/release-v2.10.0/optuna/storages/_rdb/storage.py#L1134-L1149. Please note that you can also configure the RDB connection using RDBStorage(..., engine_kwargs=...) in user-land.

import optuna

# Relax timeout to circumvent the error. Suitable value depends on environment and e.g. trial/process parallelism. (With my local MacBook Pro and a trial parallelism of 64, a timeout of 100 seemed stable.
# Note that keys/values of `engine_kwargs` depends on the actual RDB backend.
storage = optuna.storages.RDBStorage(url="sqlite:///mystorage.db", engine_kwargs={"connect_args": {"timeout": 100}})
study = optuna.create_study(storage=storage)

And just for the record, rephrasing the documentation, we suggest actually using MySQL or other backends for distributed optimization if possible.

2reactions

louisabrahamcommented, Dec 28, 2019

Note: I was using sqlite3 as backend.

I think this feature can be implemented through the connect_args of sqlalchemy.create_engine.

The parameter to set for sqlite3.connect is timeout (in seconds).

~Another way to implement it would be to have the optuna study optimize command take a n_jobs parameter. Thus, it would handle a multiprocessing.Pool (which would be slightly more efficient than giving the same fixed number of jobs to every process when the computation time varies a lot).~ The advantage is that it would be possible to adapt the timeout linearly: timeout = n_jobs * 5.0. ~However, this is a major design change.~

EDIT: n_jobs is already implemented, I just think it could be improved if it adapted the timeout.

Thus, I suggest to just add a parameter --sqlite-timeout.