question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

API spatial join: need for index retention?

See original GitHub issue

Some time ago when helping debugging some things in the sjoin implementation (https://github.com/geopandas/geopandas/pull/422), I wondered whether some of the complexity is actually needed.

Treatment of the index of the passed frames

Currently, for left and inner joins, the index of the left frame is preserved as index, the index of the right frame is added as a column ‘index_right’. For right joins, the index of the right frame is used as the index and the left index is added as a column.

Some ‘problems’ with this:

Those can be solved in the code, but in general keeping track of those indices makes the code just more complex.

Therefore, is preserving the index actually needed?

  • pandas join methods do not preserve the index (eg pd.merge when joining on a column)
  • you can easily preserve the index in the result by making it a column with reset_index

Example with actual code to illustrate: http://nbviewer.jupyter.org/gist/jorisvandenbossche/9b6b7adce5cd60973866d77b364cbc27

Possible ways forward:

  1. keep as is (and try to solve bugs / inconsistencies)
  2. drop them (user can easily keep them by doing reset_index)
  3. regard them as columns (do the reset_index ourselves). This mainly preserves the current behaviour, except for that the name might be different and that both are as columns in the result (and so also is api change). But, this has the drawback that the user cannot opt out of keeping the indices, while with the previous one he/she can easily opt in.

(my preference would be option 2, but this is of course a backwards incompatible change)

cc @perrygeo @mdbartos @ResidentMario @jdmcbr

Issue Analytics

  • State:open
  • Created 6 years ago
  • Comments:13 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
geissbuehlerpetercommented, Jun 15, 2021

I’m experiencing the same issues as described above. Even though removing, renaming the index I get this error: ‘index_left’ and ‘index_right’ cannot be names in the frames being joined.

Is there a workaround which works now? I cannot find any

1reaction
jorisvandenbosschecommented, Aug 20, 2017

how would this breaking change mix in with this library’s release schedule? geopandas hasn’t had a release in a very long time.

Planning a release in the next days (0.3.0 towards end of coming week, see https://github.com/geopandas/geopandas/issues/470)

The question I have is how that next release (or set of releases) should be organized. Is now a good time to increment a major version and make some breaking API changes?

I would certainly leave it for the next release (to have some more time than a couple of days to discuss/let live in master). I think it is OK to do such a change in eg 0.3 -> 0.4 (and not 1.0), also since we are still in 0.x.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Spatial Join (Analysis)—ArcMap | Documentation
A spatial join involves matching rows from the Join Features to the Target Features based on their relative spatial locations.
Read more >
Configuring data retention and index rollover time periods - IBM
Procedure ; In the Cloud Manager, click Topology. ; Locate the Analytics service that requires data retention and index rollover settings, then click...
Read more >
Tutorial: Configuring data retention · Apache Druid
It will also be helpful to have finished Tutorial: Loading a file and ... bin/post-index-task --file quickstart/tutorial/retention-index.json --url ...
Read more >
Manage historical data in System-Versioned Temporal Tables
It is important to notice that only history tables with a clustered index (B+ tree or columnstore) can have finite retention policy configured....
Read more >
Quickest Way to Spatial Join 250000 Features - Esri Community
Explode them first. Be reasonable with your requests. ArcGIS does not often warn you if you have not thought through the task. You...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found