Groupby over geometry
See original GitHub issueI’m really enjoying using geopandas, it’s great!
Today I want to do a groupby over the geometry column. Essentially, I want to count how many times each linestring occurs. The challenge seems to be that geometries are not hashable in python3 because they are mutable: https://github.com/Toblerity/Shapely/issues/209
In practice this mutability wouldn’t be an issue for many users. I used the workaround suggested in the shapely issue page for a workaround: I created a function that hashed the coordinates. I used this function to create a new column with hash values for each geometry called ‘geom_hash’ and I did groupby over the geom_hash column.
A non-vectorised implementation of this hash function is here:
def __hash__(df):
df['geom_hash'] = 0
for idx,geom in enumerate(df.geometry):
df['geom_hash'].iloc[idx] = hash(tuple(geom.coords))
return df
How do people about this being built-in to geopandas? If the mutability issue is a concern, could this be added to the documentation as a workaround?
The function above is very slow - any suggestions on how the hash step could be vectorised?
Thanks for any suggestions! Liam
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (4 by maintainers)

Top Related StackOverflow Question
So there seems to be
GEOSNormalize(https://geos.osgeo.org/doxygen/geos__c_8h_source.html), which is also exposed in PostGIS: a https://postgis.net/docs/ST_Normalize.html (but not yet in shapely I think), which may at least partly solve such comparison problems.