question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

recursion error reading csv file with geopandas > 0.6.2 but not geopandas 0.5.1

See original GitHub issue

Hello! I found a bug in the latest version of geopandas where I’m getting a Recursion Error when reading a csv file. The file can be downloaded and tested here: https://github.com/CosmiQ/solaris/blob/master/solaris/data/w_multipolygon.csv

Fail case:

→ ipython
Python 3.6.7 | packaged by conda-forge | (default, Nov  6 2019, 16:19:42) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.10.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import geopandas as gpd                                                 

In [2]: import os                                                               

In [3]: gpd.__version__                                                         
Out[3]: '0.6.2'

In [4]: from solaris.data import data_dir                                       

In [5]: gpd.read_file(os.path.join(data_dir, 'w_multipolygon.csv'))             
---------------------------------------------------------------------------
RecursionError                            Traceback (most recent call last)
<ipython-input-5-43f7a47ec290> in <module>
----> 1 gpd.read_file(os.path.join(data_dir, 'w_multipolygon.csv'))

~/miniconda3/envs/solaris/lib/python3.6/site-packages/geopandas/io/file.py in read_file(filename, bbox, **kwargs)
     93 
     94             columns = list(features.meta["schema"]["properties"]) + ["geometry"]
---> 95             gdf = GeoDataFrame.from_features(f_filt, crs=crs, columns=columns)
     96 
     97     return gdf

~/miniconda3/envs/solaris/lib/python3.6/site-packages/geopandas/geodataframe.py in from_features(cls, features, crs, columns)
    301             d.update(f["properties"])
    302             rows.append(d)
--> 303         df = GeoDataFrame(rows, columns=columns)
    304         df.crs = crs
    305         return df

~/miniconda3/envs/solaris/lib/python3.6/site-packages/geopandas/geodataframe.py in __init__(self, *args, **kwargs)
     75             index = self.index
     76             try:
---> 77                 self["geometry"] = _ensure_geometry(self["geometry"].values)
     78             except TypeError:
     79                 pass

~/miniconda3/envs/solaris/lib/python3.6/site-packages/geopandas/geodataframe.py in __getitem__(self, key)
    555         GeoDataFrame.
    556         """
--> 557         result = super(GeoDataFrame, self).__getitem__(key)
    558         geo_col = self._geometry_column_name
    559         if isinstance(key, string_types) and key == geo_col:

~/miniconda3/envs/solaris/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3005             indexer = np.where(indexer)[0]
   3006 
-> 3007         data = self.take(indexer, axis=1)
   3008 
   3009         if is_single_key:

~/miniconda3/envs/solaris/lib/python3.6/site-packages/pandas/core/generic.py in take(self, indices, axis, is_copy, **kwargs)
   3604             indices, axis=self._get_block_manager_axis(axis), verify=True
   3605         )
-> 3606         result = self._constructor(new_data).__finalize__(self)
   3607 
   3608         # Maybe set copy if we didn't actually change the index.

... last 4 frames repeated, from the frame below ...

~/miniconda3/envs/solaris/lib/python3.6/site-packages/geopandas/geodataframe.py in __init__(self, *args, **kwargs)
     75             index = self.index
     76             try:
---> 77                 self["geometry"] = _ensure_geometry(self["geometry"].values)
     78             except TypeError:
     79                 pass

RecursionError: maximum recursion depth exceeded in comparison

The error does not happen with geopandas version 0.5.1:

→ ipython
Python 3.6.7 | packaged by conda-forge | (default, Nov  6 2019, 16:19:42) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.10.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import geopandas as gpd                                                                                                                                                                             

In [2]: gpd.__version__                                                                                                                                                                                     
Out[2]: '0.5.1'

In [3]: from solaris.data import data_dir                                                                                                                                                                   

In [4]: import os                                                                                                                                                                                           

In [5]: gpd.read_file(os.path.join(data_dir, 'w_multipolygon.csv'))                                                                                                                                         
Out[5]: 
  field_1 access addr_house addr_hou_1 addr_inter  ... origlen           partialDec truncated                                           geometry                                           geometry
0     660                                          ...       0                  1.0         0  POLYGON ((742959.5157261142 3739469.858595584,...  POLYGON ((742959.5157261142 3739469.858595584,...
1     661                                          ...       0                  1.0         0  POLYGON ((743020.5285401859 3739472.034202539,...  POLYGON ((743020.5285401859 3739472.034202539,...
2     662                                          ...       0                  1.0         0  POLYGON ((743003.1399996518 3739467.617796136,...  POLYGON ((743003.1399996518 3739467.617796136,...
3     663                                          ...       0                  1.0         0  POLYGON ((742737.216587933 3739529.428720969, ...  POLYGON ((742737.216587933 3739529.428720969, ...
4     664                                          ...       0                  1.0         0  POLYGON ((742779.0589662758 3739517.785098479,...  POLYGON ((742779.0589662758 3739517.785098479,...
5     665                                          ...       0   0.4599659863913662         1  POLYGON ((742693.3122179032 3739539, 742694.59...  POLYGON ((742693.3122179032 3739539, 742694.59...
6     666                                          ...       0  0.13212004440894673         1  MULTIPOLYGON (((742820.9164519846 3739539, 742...  MULTIPOLYGON (((742820.9164519846 3739539, 742...
7     667        

                                  ...       0   0.6854550148892067         1  POLYGON ((743051 3739240.429075356, 743019.733...  POLYGON ((743051 3739240.429075356, 743019.733...
8     668                                          ...       0   0.3268898818394317         1  POLYGON ((743046.4084919207 3739315.608520228,...  POLYGON ((743046.4084919207 3739315.608520228,...
9     669                                          ...       0                  1.0         0  POLYGON ((741360.3983888365 3743875.358136298,...  POLYGON ((741360.3983888365 3743875.358136298,...

In both cases I’m on Ubuntu 18.04, python 3.6.7

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

10reactions
jorisvandenbosschecommented, Dec 12, 2019

@rbavery Thanks for the clear issue report!

The direct cause seems to be that there is a duplicate “geometry” column, which now generates an error in geopandas 0.6 (which is indeed a regression).

However, the object that was returned in geopandas 0.5 was also not a “proper” GeoDataFrame. There are 2 geometry columns, but both contain strings, not shapely geometries. Once you try one of the geospatial methods that geopandas provides, things break down in 0.5 as well.

The underlying cause for that is, I think, that the GDAL CSV driver does not work like that out of the box: https://gdal.org/drivers/vector/csv.html. From that page, it seems that it also reads non-spatial data, and that there are options to specify which columns are x/y values, or which column is WKB or WKT format. So for this specific csv file, it is not reading in geometry data, but just as plain csv file with strings.

When specifying the following keywords:

df = geopandas.read_file("w_multipolygon.csv", GEOM_POSSIBLE_NAMES="geometry", KEEP_GEOM_COLUMNS="NO") 

I get a proper GeoDataFrame (with a single “geometry” column that actually holds shapely geometries), both in geopandas 0.5 and 0.6.

Now, that said: it would be nice if geopandas could give a more helpful error message here (also the 0.5 behaviour does not seem to be useful), indicating that no geometry data are found. And if we can know the CSV driver is used, we could also give an even more informative error message pointing to those keywords.

2reactions
jorisvandenbosschecommented, Dec 12, 2019

Is it because the default KEEP_GEOM_COLUMNS value is “YES” for geopandas when it uses the GDAL CSV driver? (Why would one want to keep that column, which is by definition going to be a duplicate geometry column? I’m guessing I’m missing something here…)

Yes, that’s indeed the reason. I agree it’s a somewhat strange default, but that’s what GDAL does, see https://gdal.org/drivers/vector/csv.html (in GeoPandas, we don’t override any GDAL options)

But note that this duplicate column is not the only issue. You also still need to pass GEOM_POSSIBLE_NAMES="geometry" to actually parse the WKT.

So with the default options, read_file is not very useful for CSV files …

Read more comments on GitHub >

github_iconTop Results From Across the Web

Geopandas: how to read a csv and convert to ... - Stack Overflow
Geopandas seems to be unable to convert a geometry column from a pandas dataframe. Solution number 2. Try applying the shapely wkt.loads ...
Read more >
Installed Python Libraries - CoCalc
Library Python 3 SageMath Anaconda 2020 Anaco... abelfunctions 0.1.0 access. Calculate spatial accessibility metrics. 1.1.8 1.1.8 1.1.3 1.1.3 admcycles. Tautological ring on Mbar_g,n 1.3.2
Read more >
Snowflake Snowpark for Python - Anaconda repo
Package Latest Version linux‑64 linux‑aarch64 osx‑64 osx‑arm64 win‑64 _libgcc_mutex 0.1 X X _low_priority 1.0 X X X _mutex_mxnet 0.0.50 X X
Read more >
Source Packages in "bionic", Subsection misc - Ubuntu
... fast-cpp-csv-parser (0.0+git20160525~9bf299c-1) [universe] ... golang-github-docopt-docopt-go (0.6.2+git20160216.0.784ddc5-1) [universe] ...
Read more >
Package List — Spack 0.20.0.dev0 documentation
batchedblas, perl-file-copy-recursive, py-wradlib ... libFLAME is a C-only implementation and does not depend on any external FORTRAN libraries including ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found