Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Groupby over geometry

See original GitHub issue

I’m really enjoying using geopandas, it’s great!

Today I want to do a groupby over the geometry column. Essentially, I want to count how many times each linestring occurs. The challenge seems to be that geometries are not hashable in python3 because they are mutable: https://github.com/Toblerity/Shapely/issues/209

In practice this mutability wouldn’t be an issue for many users. I used the workaround suggested in the shapely issue page for a workaround: I created a function that hashed the coordinates. I used this function to create a new column with hash values for each geometry called ‘geom_hash’ and I did groupby over the geom_hash column.

A non-vectorised implementation of this hash function is here:

def __hash__(df):
    df['geom_hash'] = 0
    for idx,geom in enumerate(df.geometry):
        df['geom_hash'].iloc[idx] = hash(tuple(geom.coords))
    return df

How do people about this being built-in to geopandas? If the mutability issue is a concern, could this be added to the documentation as a workaround?

The function above is very slow - any suggestions on how the hash step could be vectorised?

Thanks for any suggestions! Liam

Issue Analytics

State:
Created 5 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

3reactions

jorisvandenbosschecommented, Oct 9, 2018

So there seems to be GEOSNormalize (https://geos.osgeo.org/doxygen/geos__c_8h_source.html), which is also exposed in PostGIS: a https://postgis.net/docs/ST_Normalize.html (but not yet in shapely I think), which may at least partly solve such comparison problems.

1reaction

openSourcerer9000commented, Jan 8, 2021

Based on ~200K polygon geometry. Thanks for the tip, I was trying with WKT. Until this gets fixed I will try this workaround,