recursion error reading csv file with geopandas > 0.6.2 but not geopandas 0.5.1
See original GitHub issueHello! I found a bug in the latest version of geopandas where I’m getting a Recursion Error when reading a csv file. The file can be downloaded and tested here: https://github.com/CosmiQ/solaris/blob/master/solaris/data/w_multipolygon.csv
Fail case:
→ ipython
Python 3.6.7 | packaged by conda-forge | (default, Nov 6 2019, 16:19:42)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.10.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import geopandas as gpd
In [2]: import os
In [3]: gpd.__version__
Out[3]: '0.6.2'
In [4]: from solaris.data import data_dir
In [5]: gpd.read_file(os.path.join(data_dir, 'w_multipolygon.csv'))
---------------------------------------------------------------------------
RecursionError Traceback (most recent call last)
<ipython-input-5-43f7a47ec290> in <module>
----> 1 gpd.read_file(os.path.join(data_dir, 'w_multipolygon.csv'))
~/miniconda3/envs/solaris/lib/python3.6/site-packages/geopandas/io/file.py in read_file(filename, bbox, **kwargs)
93
94 columns = list(features.meta["schema"]["properties"]) + ["geometry"]
---> 95 gdf = GeoDataFrame.from_features(f_filt, crs=crs, columns=columns)
96
97 return gdf
~/miniconda3/envs/solaris/lib/python3.6/site-packages/geopandas/geodataframe.py in from_features(cls, features, crs, columns)
301 d.update(f["properties"])
302 rows.append(d)
--> 303 df = GeoDataFrame(rows, columns=columns)
304 df.crs = crs
305 return df
~/miniconda3/envs/solaris/lib/python3.6/site-packages/geopandas/geodataframe.py in __init__(self, *args, **kwargs)
75 index = self.index
76 try:
---> 77 self["geometry"] = _ensure_geometry(self["geometry"].values)
78 except TypeError:
79 pass
~/miniconda3/envs/solaris/lib/python3.6/site-packages/geopandas/geodataframe.py in __getitem__(self, key)
555 GeoDataFrame.
556 """
--> 557 result = super(GeoDataFrame, self).__getitem__(key)
558 geo_col = self._geometry_column_name
559 if isinstance(key, string_types) and key == geo_col:
~/miniconda3/envs/solaris/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
3005 indexer = np.where(indexer)[0]
3006
-> 3007 data = self.take(indexer, axis=1)
3008
3009 if is_single_key:
~/miniconda3/envs/solaris/lib/python3.6/site-packages/pandas/core/generic.py in take(self, indices, axis, is_copy, **kwargs)
3604 indices, axis=self._get_block_manager_axis(axis), verify=True
3605 )
-> 3606 result = self._constructor(new_data).__finalize__(self)
3607
3608 # Maybe set copy if we didn't actually change the index.
... last 4 frames repeated, from the frame below ...
~/miniconda3/envs/solaris/lib/python3.6/site-packages/geopandas/geodataframe.py in __init__(self, *args, **kwargs)
75 index = self.index
76 try:
---> 77 self["geometry"] = _ensure_geometry(self["geometry"].values)
78 except TypeError:
79 pass
RecursionError: maximum recursion depth exceeded in comparison
The error does not happen with geopandas version 0.5.1:
→ ipython
Python 3.6.7 | packaged by conda-forge | (default, Nov 6 2019, 16:19:42)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.10.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import geopandas as gpd
In [2]: gpd.__version__
Out[2]: '0.5.1'
In [3]: from solaris.data import data_dir
In [4]: import os
In [5]: gpd.read_file(os.path.join(data_dir, 'w_multipolygon.csv'))
Out[5]:
field_1 access addr_house addr_hou_1 addr_inter ... origlen partialDec truncated geometry geometry
0 660 ... 0 1.0 0 POLYGON ((742959.5157261142 3739469.858595584,... POLYGON ((742959.5157261142 3739469.858595584,...
1 661 ... 0 1.0 0 POLYGON ((743020.5285401859 3739472.034202539,... POLYGON ((743020.5285401859 3739472.034202539,...
2 662 ... 0 1.0 0 POLYGON ((743003.1399996518 3739467.617796136,... POLYGON ((743003.1399996518 3739467.617796136,...
3 663 ... 0 1.0 0 POLYGON ((742737.216587933 3739529.428720969, ... POLYGON ((742737.216587933 3739529.428720969, ...
4 664 ... 0 1.0 0 POLYGON ((742779.0589662758 3739517.785098479,... POLYGON ((742779.0589662758 3739517.785098479,...
5 665 ... 0 0.4599659863913662 1 POLYGON ((742693.3122179032 3739539, 742694.59... POLYGON ((742693.3122179032 3739539, 742694.59...
6 666 ... 0 0.13212004440894673 1 MULTIPOLYGON (((742820.9164519846 3739539, 742... MULTIPOLYGON (((742820.9164519846 3739539, 742...
7 667
... 0 0.6854550148892067 1 POLYGON ((743051 3739240.429075356, 743019.733... POLYGON ((743051 3739240.429075356, 743019.733...
8 668 ... 0 0.3268898818394317 1 POLYGON ((743046.4084919207 3739315.608520228,... POLYGON ((743046.4084919207 3739315.608520228,...
9 669 ... 0 1.0 0 POLYGON ((741360.3983888365 3743875.358136298,... POLYGON ((741360.3983888365 3743875.358136298,...
In both cases I’m on Ubuntu 18.04, python 3.6.7
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
Geopandas: how to read a csv and convert to ... - Stack Overflow
Geopandas seems to be unable to convert a geometry column from a pandas dataframe. Solution number 2. Try applying the shapely wkt.loads ...
Read more >Installed Python Libraries - CoCalc
Library Python 3 SageMath Anaconda 2020 Anaco...
abelfunctions 0.1.0
access. Calculate spatial accessibility metrics. 1.1.8 1.1.8 1.1.3 1.1.3
admcycles. Tautological ring on Mbar_g,n 1.3.2
Read more >Snowflake Snowpark for Python - Anaconda repo
Package Latest Version linux‑64 linux‑aarch64 osx‑64 osx‑arm64 win‑64
_libgcc_mutex 0.1 X X
_low_priority 1.0 X X X
_mutex_mxnet 0.0.50 X X
Read more >Source Packages in "bionic", Subsection misc - Ubuntu
... fast-cpp-csv-parser (0.0+git20160525~9bf299c-1) [universe] ... golang-github-docopt-docopt-go (0.6.2+git20160216.0.784ddc5-1) [universe] ...
Read more >Package List — Spack 0.20.0.dev0 documentation
batchedblas, perl-file-copy-recursive, py-wradlib ... libFLAME is a C-only implementation and does not depend on any external FORTRAN libraries including ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@rbavery Thanks for the clear issue report!
The direct cause seems to be that there is a duplicate “geometry” column, which now generates an error in geopandas 0.6 (which is indeed a regression).
However, the object that was returned in geopandas 0.5 was also not a “proper” GeoDataFrame. There are 2 geometry columns, but both contain strings, not shapely geometries. Once you try one of the geospatial methods that geopandas provides, things break down in 0.5 as well.
The underlying cause for that is, I think, that the GDAL CSV driver does not work like that out of the box: https://gdal.org/drivers/vector/csv.html. From that page, it seems that it also reads non-spatial data, and that there are options to specify which columns are x/y values, or which column is WKB or WKT format. So for this specific csv file, it is not reading in geometry data, but just as plain csv file with strings.
When specifying the following keywords:
I get a proper GeoDataFrame (with a single “geometry” column that actually holds shapely geometries), both in geopandas 0.5 and 0.6.
Now, that said: it would be nice if geopandas could give a more helpful error message here (also the 0.5 behaviour does not seem to be useful), indicating that no geometry data are found. And if we can know the CSV driver is used, we could also give an even more informative error message pointing to those keywords.
Yes, that’s indeed the reason. I agree it’s a somewhat strange default, but that’s what GDAL does, see https://gdal.org/drivers/vector/csv.html (in GeoPandas, we don’t override any GDAL options)
But note that this duplicate column is not the only issue. You also still need to pass
GEOM_POSSIBLE_NAMES="geometry"
to actually parse the WKT.So with the default options,
read_file
is not very useful for CSV files …