question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Index type is lost when filtering to empty geo data frame

See original GitHub issue

Description

When filtering a GeoDataFrame with a condition that returns no True value, so that the resulting dataset is empty, the type of the index is lost.

Test

import pandas as pd
import geopandas as gpd
from datetime import timedelta, datetime
from shapely.geometry import Point


df = pd.DataFrame(dict(geometry=[Point(i, i) for i in range(3)], timestamp=[datetime.now() + timedelta(seconds=i) for i in range(3)])).set_index("timestamp")
gdf = gpd.GeoDataFrame(df)
max_date = df.index.max()

df_filtered = df.loc[df.index > max_date, :]
gdf_filtered = gdf.loc[gdf.index > max_date, :]
print(df_filtered.index)
print(gdf_filtered.index)

0.5.1

Output

DatetimeIndex([], dtype='datetime64[ns]', name='timestamp', freq=None) 
DatetimeIndex([], dtype='datetime64[ns]', name='timestamp', freq=None)

Environment

attrs==19.3.0
Click==7.0
click-plugins==1.1.1
cligj==0.5.0
Fiona==1.8.9.post2
geopandas==0.5.1
munch==2.5.0
numpy==1.17.3
pandas==0.25.3
pyproj==2.4.0
python-dateutil==2.8.1
pytz==2019.3
Shapely==1.6.4.post2
six==1.12.0

0.6.1

Output

DatetimeIndex([], dtype='datetime64[ns]', name='timestamp', freq=None) 
RangeIndex(start=0, stop=0, step=1)

Environment

attrs==19.3.0
Click==7.0
click-plugins==1.1.1
cligj==0.5.0
Fiona==1.8.9.post2
geopandas==0.6.1
munch==2.5.0
numpy==1.17.3
pandas==0.25.3
pyproj==2.4.0
python-dateutil==2.8.1
pytz==2019.3
Shapely==1.6.4.post2
six==1.12.0

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
jorisvandenbosschecommented, Nov 5, 2019

@simon-keith thanks a lot for the clear, reproducible bug report!

I can reproduce this on 0.6.1, but not with master. So something might have fixed this already. In which case, we should add a test and could backport it to 0.6.x

0reactions
jorisvandenbosschecommented, Nov 11, 2019

OK, I found the change that explains why it fails with released pandas but works with pandas master.

Pandas 0.25:

In [18]: df = pd.DataFrame({'a':[1, 2, 3], 'b': [.2, .3, .4]}, index=pd.date_range("2012-01-01", periods=3))

In [20]: df_empty = df[:0] 

In [21]: df_empty.index 
Out[21]: DatetimeIndex([], dtype='datetime64[ns]', freq='D')

In [22]: df_empty['a'] = np.array([], dtype='int64')  

In [23]: df_empty.index   
Out[23]: RangeIndex(start=0, stop=0, step=1)

In [24]: pd.__version__ 
Out[24]: '0.25.3'

With pandas master:

In [1]: df = pd.DataFrame({'a':[1, 2, 3], 'b': [.2, .3, .4]}, index=pd.date_range("2012-01-01", periods=3))

In [2]: df_empty = df[:0] 

In [3]: df_empty.index  
Out[3]: DatetimeIndex([], dtype='datetime64[ns]', freq='D')

In [4]: df_empty['a'] = np.array([], dtype='int64')   

In [5]: df_empty.index
Out[5]: DatetimeIndex([], dtype='datetime64[ns]', freq='D')

In [6]: pd.__version__ 
Out[6]: '0.26.0.dev0+822.g6b06f4342.dirty'

So assigning to a column of an empty frame resets the index on released pandas, but this has been fixed to preserve the index on pandas master.

And with GeoPandas 0.6, we started to rely on this pattern to write the geometries into the array (and thus also do this for empty geodataframes), hence introducing this regression for released pandas versions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Missing and empty geometries - GeoPandas
The scalar object (when accessing a single element of a GeoSeries) is still a Shapely geometry object. Missing geometries are unknown values in...
Read more >
Filtering GeoDataFrame rows with list of strings in GeoPandas
Another approach utilizes the .query() , that was also mentioned in the Use a list of values to select rows from a Pandas...
Read more >
Geo Pandas Data Frame / Matrix - filter/drop NaN / False values
A possible solution to get to the dataframe of matching indexes: ... You can then of course delete the column with trues: .drop(0,...
Read more >
Working with Missing Data in Pandas - GeeksforGeeks
In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a ......
Read more >
Revision - 8932b3c - BUG: return empty gdf for empty result of clip ...
also fixes the repr of an empty or all-NA GeoSeries (#1184, #1195). - Fix filtering of a GeoDataFrame to preserve the index type...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found