ENH: give user control over selection of spatial index frame in sjoin
See original GitHub issueAs discussed in #1421, it could be advantageous to give users some control over which frame (between left_df
and right_df
) is used as the spatial index backend, as this can have a huge impact on sjoin
performance.
One proposal was a index
keyword that could have values left
, right
or None
(https://github.com/geopandas/geopandas/pull/1421#issuecomment-631342047, https://github.com/geopandas/geopandas/pull/1421#issuecomment-631578974).
ccing @ljwolf, @martinfleis feel free to assign this to me
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:7 (6 by maintainers)
Top Results From Across the Web
geopandas.sjoin
Spatial join of two GeoDataFrames. See the User Guide page Merging Data for details. The type of join:
Read more >GEOPANDAS .sjoin 'index_left' and 'index_right' cannot be ...
I am trying to make a spatial join of ...
Read more >Spatial Join Techniques
A typical spatial join technique consists of the following components: partitioning the data, performing internal memory spatial joins on subsets of the data, ......
Read more >Efficient spatial data partitioning for distributed $$k$$ k NN joins
A ready to use kNN spatial join query for Apache Spark available on GitHub. A thorough experimental runtime and accuracy study using real-world ......
Read more >Spatial Support - an overview | ScienceDirect Topics
While being a data-partitioning approach to spatial indexing, unlike octrees and ... Users' needs for ample control over variable aspects of inquiry ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
In the documentation vein: I agree that a recipe for more advanced patterns is a good idea, but maybe it also makes sense to clarify in the docstring that (if I’m reading correctly)
right_df.sindex
is used by default unless the predicate iswithin
? I think a simple hint could be enough to help most people avoid accidentally sjoining in a much less performant direction.I haven’t really got my head around the modular sjoin, but options 2 and 3 (in your list, @adriangb) seem quite complicated for an average user, who just wants to provide a performance hint, without learning about sjoin internals and writing code to fit it.
Documentation is good, of course, but I don’t think it’s true that this issue is resolved by documentation alone. Swapping the order of the GDFs in the sjoin call has implications for the interpretation of all the other arguments, and for the structure of the result, and could require the user to do a lot of recoding and testing to successfully map ‘predicate’ and ‘how’, swap ‘lsuffix’ and ‘rsuffix’ etc. and then re-order columns. It seems a pity to impose this on the userbase, rather than write it once within sjoin!