question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"Internal Proj Error: [...] database disk image is malformed" when multiprocessing since pyproj 2.3

See original GitHub issue

Code Sample, a copy-pastable example if possible

It’s unfortunately not possible to produce a minimal example, this only happens in the full setup of our project, but is 100% reproducible there. See for example: https://travis-ci.org/OGGM/OGGM-Anaconda/jobs/580670196#L1406

I tried triggering this by just calling the pyproj.Proj() invocation in a lot of parallel processes, but it was not impressed by that and worked fine.

Problem description

Ever since pyproj 2.3 pyproj.exceptions.CRSError: Invalid projection: +init=epsg:4326 +type=crs: (Internal Proj Error: proj_create: SQLite error on SELECT auth_name FROM authority_list: database disk image is malformed) occurs when trying to do pyproj.Proj(“+init=EPSG:4326”, preserve_units=True) in our concurrent multiprocessing setup.

Turning off multiprocessing and running things sequentially works around the issue.

Downgrading pyproj to <2.3 also fixes it. Mind that I did not downgrade the underlying proj4 binary library, so purely downgrade pyproj is enough to stop this from happening.

Environment Information

System:
    python: 3.7.3 | packaged by conda-forge | (default, Jul  1 2019, 21:52:21)  [GCC 7.3.0]
executable: /home/users/timo/miniconda3/envs/projtest_env/bin/python
   machine: Linux-4.19.64-gentoo-x86_64-Intel-R-_Xeon-R-_CPU_E5-2623_v4_@_2.60GHz-with-gentoo-2.6

PROJ:
      PROJ: 6.1.1
  data dir: /home/users/timo/miniconda3/envs/projtest_env/share/proj

Python deps:
    pyproj: 2.3.1
       pip: 19.2.3
setuptools: 41.2.0
    Cython: 0.29.13

Installation method

  • conda

Conda environment information (if you installed with conda):


Environment (conda list):
$ conda list | grep -E "proj|aenum"
proj4                     6.1.1                hc80f0dc_1    conda-forge
pyproj                    2.3.1            py37h2fd02e8_0    conda-forge

Details about conda and system ( conda info ):
$ conda info
     active environment : projtest_env
    active env location : /home/users/timo/miniconda3/envs/projtest_env
            shell level : 1
       user config file : /home/users/timo/.condarc
 populated config files : /home/users/timo/.condarc
          conda version : 4.7.11
    conda-build version : 3.18.9
         python version : 3.7.4.final.0
       virtual packages :
       base environment : /home/users/timo/miniconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /home/users/timo/miniconda3/pkgs
                          /home/users/timo/.conda/pkgs
       envs directories : /home/users/timo/miniconda3/envs
                          /home/users/timo/.conda/envs
               platform : linux-64
             user-agent : conda/4.7.11 requests/2.22.0 CPython/3.7.4 Linux/4.19.64-gentoo gentoo/2.6 glibc/2.29
                UID:GID : 10000:10000
             netrc file : None
           offline mode : False

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

5reactions
coroacommented, Jan 21, 2020

Thanks for the fast answer! I managed to find a solution in the meantime.

For future reference for other people finding this issue based on the “database disk image is malformed” error in conjunction with multiprocessing (and maybe also for @TimoRoth).

In my case, making sure that the module imports of gdal happened after forking to multiple processes fixed the issue. The import of gdal does create a gdal context, which – I suspect – also contains an sqlite database handle to the proj.db database, which gets corrupted by multiple processes writing to it. Another working alternative is to use multiprocessing.set_start_method('spawn') to not use forking, at all.

0reactions
TimoRothcommented, Feb 3, 2020

After some further analysis, this is caused by gdal using the proj C API itself, to create non-autoclosing proj contexts. Changing this in gdal seems very daunting, and until the fix reaches any production system via distros would also take forever. So I’m not sure what the correct curse of action here is.

In the long run, a way to globally control the proj behaviour would be ideal. An env var that forces it to always autoclose the db, or at least change the default from false to true.

Read more comments on GitHub >

github_iconTop Results From Across the Web

DB Error: database disk image is malformed - Stack Overflow
Use the sqlite3 command line tool to examine the file. See what it says. (But note that you need to examine the file...
Read more >
ChangeLog - ftp
- Clean up python 2 gubbins. ++++ shotwell: - Update to version 0.31.7: + Actually run database upgrade necessary for date/time changes. +...
Read more >
rasterio Documentation - Read the Docs
Rasterio follows pyproj and uses PROJ.4 syntax in dict form as its native CRS syntax. If you want a WKT representation.
Read more >
Error: database disk image is malformed for ... - SQLite Forum
I understand that SELECT should not write to the database but somehow the file get corrupted after several searches without timestamp changed.
Read more >
Changelog — Python 3.11.1 documentation
gh-99578: Fix a reference bug in _imp.create_builtin() after the creation of the first sub-interpreter for modules builtins and sys . Patch by ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found