question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Groupby over geometry

See original GitHub issue

I’m really enjoying using geopandas, it’s great!

Today I want to do a groupby over the geometry column. Essentially, I want to count how many times each linestring occurs. The challenge seems to be that geometries are not hashable in python3 because they are mutable: https://github.com/Toblerity/Shapely/issues/209

In practice this mutability wouldn’t be an issue for many users. I used the workaround suggested in the shapely issue page for a workaround: I created a function that hashed the coordinates. I used this function to create a new column with hash values for each geometry called ‘geom_hash’ and I did groupby over the geom_hash column.

A non-vectorised implementation of this hash function is here:

def __hash__(df):
    df['geom_hash'] = 0
    for idx,geom in enumerate(df.geometry):
        df['geom_hash'].iloc[idx] = hash(tuple(geom.coords))
    return df

How do people about this being built-in to geopandas? If the mutability issue is a concern, could this be added to the documentation as a workaround?

The function above is very slow - any suggestions on how the hash step could be vectorised?

Thanks for any suggestions! Liam

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
jorisvandenbosschecommented, Oct 9, 2018

So there seems to be GEOSNormalize (https://geos.osgeo.org/doxygen/geos__c_8h_source.html), which is also exposed in PostGIS: a https://postgis.net/docs/ST_Normalize.html (but not yet in shapely I think), which may at least partly solve such comparison problems.

1reaction
openSourcerer9000commented, Jan 8, 2021

image Based on ~200K polygon geometry. Thanks for the tip, I was trying with WKT. Until this gets fixed I will try this workaround,

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas - How to groupby by geometry column with Python?
Use to_wkt from geometry column to convert shape as plain text: out = pts.groupby(['id', pts['geometry'].to_wkt()], as_index=False) ...
Read more >
Is it ok to group by geometry in PostGIS? - GIS Stack Exchange
It's important to know that a GROUP BY geom clause in PostGIS 2.3 and earlier actually groups rows based on bounding box equality,...
Read more >
Aggregation with dissolve - GeoPandas
In a non-spatial setting, when all we need are summary statistics of the data, we aggregate our data using the groupby() function. But...
Read more >
GROUP BY on a table with geometry - Oracle Communities
The solution for me was to create an user defined aggregate function: agg_first_geom. This will help aggragating your data by keeping/taking the ...
Read more >
HowTo: Group By Geography column - Michael Entin - Medium
Lacking transitivity, ST_Equals cannot be used for GROUP BY , and we need a different rule. Such a rule cannot ignore even these...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found