UnicodeDecodeError when run multithreaded
See original GitHub issueCode Sample, a copy-pastable example if possible
# TODO: working on it
Problem description
This is something that’s been noticed in Satpy specifically and is being tracked here: https://github.com/pytroll/satpy/issues/1114
The bottom line is that a couple of our users have been getting UnicodeDecodeErrors or errors about bad proj definitions. The really annoying bit is that is seems to be some sort of race condition or other multi-threading related issue. We are using xarray+dask and have a pyproj CRS object in the .coords
of our DataArrays. We get errors like:
return [_execute_task(a, cache) for a in arg]
File "/work/geo2grid/lib/python3.7/site-packages/dask/core.py", line 122, in _execute_task
elif arg in cache:
File "/work/geo2grid/lib/python3.7/site-packages/pyproj/crs/crs.py", line 869, in __hash__
return hash(self.to_wkt())
File "pyproj/_crs.pyx", line 451, in pyproj._crs.Base.to_wkt
File "pyproj/_crs.pyx", line 120, in pyproj._crs._to_wkt
File "pyproj/_crs.pyx", line 24, in pyproj._crs.cstrdecode
File "/work/geo2grid/lib/python3.7/site-packages/pyproj/compat.py", line 21, in pystrdecode
return cstr.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 0: invalid continuation byte
Command exited with non-zero status 1
Or:
File "C:\ProgramData\Miniconda3\lib\site-packages\pyresample\geometry.py", line 1012, in invproj
target_proj = Proj(proj_dict)
File "C:\ProgramData\Miniconda3\lib\site-packages\pyresample\_spatial_mp.py", line 121, in __init__
**kwargs)
File "C:\ProgramData\Miniconda3\lib\site-packages\pyproj\proj.py", line 171, in __init__
super().__init__(cstrencode(projstring.strip()))
File "pyproj/_proj.pyx", line 30, in pyproj._proj.Proj.__init__
pyproj.exceptions.ProjError: Invalid projection b'C'.: (Internal Proj Error: proj_create: unrecognized format / unknown name)
And other times it will print out the invalid projection with characters mixed in where they shouldn’t be. Like very clearly wrong changes where +proj=merc
is changed to some odd unicode character in place of the p
in proj
.
I’m trying my best to reproduce this, but so far have been unsuccessful which is why I don’t have a reproducible example yet. I’ve only ever noticed this in logs.
Expected Output
No error.
Environment Information
- Output from:
python -m pyproj -v
pyproj info:
pyproj: 2.5.0
PROJ: 6.3.0
data dir: /data1/users/davidh/miniconda3/envs/geo2grid_dist/share/proj
System:
python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) [GCC 7.3.0]
executable: /data1/users/davidh/miniconda3/envs/geo2grid_dist/bin/python
machine: Linux-2.6.32-573.12.1.el6.x86_64-x86_64-with-centos-6.10-Final
Python deps:
pip: 20.0.2
setuptools: 45.2.0.post20200209
Cython: None
Specific conda-forge builds:
proj 6.3.0 hc80f0dc_0 conda-forge
pyproj 2.5.0 py37h8ff28aa_0 conda-forge
Installation method
- conda, pip wheel, from source, etc…
Conda environment information (if you installed with conda):
I mentioned specific conda packages above, but we’ve seen this now on Ubuntu, Windows, and a CentOS 7 docker container running a conda-pack’d version of a conda-forge environment.
Issue Analytics
- State:
- Created 3 years ago
- Comments:15 (15 by maintainers)
@djhoese this may be useful for reference: https://github.com/geopandas/geopandas/issues/1842
Sorry, I thought I closed this already. This was our fault for using a CRS object with a dask
map_blocks
function (passing CRS objects between threads).