[clip vs. intersect] polygons lost due to dropping duplicates generated by .overlay()
See original GitHub issueI was testing the .overlay() function and got some interesting issues.
I imported two geojson files as the following:
The first one has a shape of (2167, 17), with a OBJECTID column:
The second one has a shape of (5, 5):
I ran .overlay(how='intersection'), and got the following map with a shape of (3195, 21)
The above map looks right, but there are many duplicated rows in the geodataframe. I used the method posted by folks here, and transferred the geometry column to wkb. I then ran .nunique() and saw there are 2165 unique values in OBJECTID, but there are 3195 unique values in the geometry column. I’m confused that why the number is not 2165 and exceeds both geodataframes I imported. Then I dropped those duplicated values based on the OBJECTID column with .drop_duplicates(subset='OBJECTID', inplace=True). I plotted the geodataframe again and got the following map. I realized that some of the polygons have lost. (Btw, I used both geojson files in QGIS before and got an identical map like the one above by .overlay(how='intersection'), and the clipped layer got 2165 rows in the attribute table, so it seems that 2165 is the right number to expect)
I checked GeoPanads’s API and didn’t see any way to prevent those duplicated rows from generating. I don’t know why there are 3195 rows in the geodataframe, and why those polygons are lost by dropping duplicates.
Issue Analytics
- State:
- Created 5 years ago
- Comments:23 (6 by maintainers)

Top Related StackOverflow Question
@austinorr I believe you brought this issue to me before. What do you think of https://github.com/geopandas/geopandas/issues/1027 ?
@austinorr thank you for mentioning that man. I was assuming that both are the same.