API spatial join: need for index retention?
See original GitHub issueSome time ago when helping debugging some things in the sjoin implementation (https://github.com/geopandas/geopandas/pull/422), I wondered whether some of the complexity is actually needed.
Treatment of the index of the passed frames
Currently, for left and inner joins, the index of the left frame is preserved as index, the index of the right frame is added as a column ‘index_right’. For right joins, the index of the right frame is used as the index and the left index is added as a column.
Some ‘problems’ with this:
- (bug/enhancement) if the index has a name, this name is lost (https://github.com/geopandas/geopandas/issues/424)
- (inconsistency) for left/inner joins, the index gets no name, but for a right join it gets the name ‘index_right’
Those can be solved in the code, but in general keeping track of those indices makes the code just more complex.
Therefore, is preserving the index actually needed?
- pandas join methods do not preserve the index (eg
pd.mergewhen joining on a column) - you can easily preserve the index in the result by making it a column with
reset_index
Example with actual code to illustrate: http://nbviewer.jupyter.org/gist/jorisvandenbossche/9b6b7adce5cd60973866d77b364cbc27
Possible ways forward:
- keep as is (and try to solve bugs / inconsistencies)
- drop them (user can easily keep them by doing
reset_index) - regard them as columns (do the
reset_indexourselves). This mainly preserves the current behaviour, except for that the name might be different and that both are as columns in the result (and so also is api change). But, this has the drawback that the user cannot opt out of keeping the indices, while with the previous one he/she can easily opt in.
(my preference would be option 2, but this is of course a backwards incompatible change)
Issue Analytics
- State:
- Created 6 years ago
- Comments:13 (12 by maintainers)

Top Related StackOverflow Question
I’m experiencing the same issues as described above. Even though removing, renaming the index I get this error: ‘index_left’ and ‘index_right’ cannot be names in the frames being joined.
Is there a workaround which works now? I cannot find any
Planning a release in the next days (0.3.0 towards end of coming week, see https://github.com/geopandas/geopandas/issues/470)
I would certainly leave it for the next release (to have some more time than a couple of days to discuss/let live in master). I think it is OK to do such a change in eg 0.3 -> 0.4 (and not 1.0), also since we are still in 0.x.