"Internal Proj Error: [...] database disk image is malformed" when multiprocessing since pyproj 2.3
See original GitHub issueCode Sample, a copy-pastable example if possible
It’s unfortunately not possible to produce a minimal example, this only happens in the full setup of our project, but is 100% reproducible there. See for example: https://travis-ci.org/OGGM/OGGM-Anaconda/jobs/580670196#L1406
I tried triggering this by just calling the pyproj.Proj() invocation in a lot of parallel processes, but it was not impressed by that and worked fine.
Problem description
Ever since pyproj 2.3
pyproj.exceptions.CRSError: Invalid projection: +init=epsg:4326 +type=crs: (Internal Proj Error: proj_create: SQLite error on SELECT auth_name FROM authority_list: database disk image is malformed)
occurs when trying to do pyproj.Proj(“+init=EPSG:4326”, preserve_units=True) in our concurrent multiprocessing setup.
Turning off multiprocessing and running things sequentially works around the issue.
Downgrading pyproj to <2.3 also fixes it. Mind that I did not downgrade the underlying proj4 binary library, so purely downgrade pyproj is enough to stop this from happening.
Environment Information
System:
python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0]
executable: /home/users/timo/miniconda3/envs/projtest_env/bin/python
machine: Linux-4.19.64-gentoo-x86_64-Intel-R-_Xeon-R-_CPU_E5-2623_v4_@_2.60GHz-with-gentoo-2.6
PROJ:
PROJ: 6.1.1
data dir: /home/users/timo/miniconda3/envs/projtest_env/share/proj
Python deps:
pyproj: 2.3.1
pip: 19.2.3
setuptools: 41.2.0
Cython: 0.29.13
Installation method
- conda
Conda environment information (if you installed with conda):
Environment (
conda list
):
$ conda list | grep -E "proj|aenum"
proj4 6.1.1 hc80f0dc_1 conda-forge
pyproj 2.3.1 py37h2fd02e8_0 conda-forge
Details about
conda
and system ( conda info
):
$ conda info
active environment : projtest_env
active env location : /home/users/timo/miniconda3/envs/projtest_env
shell level : 1
user config file : /home/users/timo/.condarc
populated config files : /home/users/timo/.condarc
conda version : 4.7.11
conda-build version : 3.18.9
python version : 3.7.4.final.0
virtual packages :
base environment : /home/users/timo/miniconda3 (writable)
channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /home/users/timo/miniconda3/pkgs
/home/users/timo/.conda/pkgs
envs directories : /home/users/timo/miniconda3/envs
/home/users/timo/.conda/envs
platform : linux-64
user-agent : conda/4.7.11 requests/2.22.0 CPython/3.7.4 Linux/4.19.64-gentoo gentoo/2.6 glibc/2.29
UID:GID : 10000:10000
netrc file : None
offline mode : False
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (4 by maintainers)
Thanks for the fast answer! I managed to find a solution in the meantime.
For future reference for other people finding this issue based on the “database disk image is malformed” error in conjunction with multiprocessing (and maybe also for @TimoRoth).
In my case, making sure that the module imports of gdal happened after forking to multiple processes fixed the issue. The import of gdal does create a gdal context, which – I suspect – also contains an sqlite database handle to the proj.db database, which gets corrupted by multiple processes writing to it. Another working alternative is to use
multiprocessing.set_start_method('spawn')
to not use forking, at all.After some further analysis, this is caused by gdal using the proj C API itself, to create non-autoclosing proj contexts. Changing this in gdal seems very daunting, and until the fix reaches any production system via distros would also take forever. So I’m not sure what the correct curse of action here is.
In the long run, a way to globally control the proj behaviour would be ideal. An env var that forces it to always autoclose the db, or at least change the default from false to true.