question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: give user control over selection of spatial index frame in sjoin

See original GitHub issue

As discussed in #1421, it could be advantageous to give users some control over which frame (between left_df and right_df) is used as the spatial index backend, as this can have a huge impact on sjoin performance.

One proposal was a index keyword that could have values left, right or None (https://github.com/geopandas/geopandas/pull/1421#issuecomment-631342047, https://github.com/geopandas/geopandas/pull/1421#issuecomment-631578974).

ccing @ljwolf, @martinfleis feel free to assign this to me

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:2
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
bnaulcommented, Aug 26, 2020

In the documentation vein: I agree that a recipe for more advanced patterns is a good idea, but maybe it also makes sense to clarify in the docstring that (if I’m reading correctly) right_df.sindex is used by default unless the predicate is within? I think a simple hint could be enough to help most people avoid accidentally sjoining in a much less performant direction.

0reactions
FenderJazzcommented, Oct 12, 2022

I haven’t really got my head around the modular sjoin, but options 2 and 3 (in your list, @adriangb) seem quite complicated for an average user, who just wants to provide a performance hint, without learning about sjoin internals and writing code to fit it.

Documentation is good, of course, but I don’t think it’s true that this issue is resolved by documentation alone. Swapping the order of the GDFs in the sjoin call has implications for the interpretation of all the other arguments, and for the structure of the result, and could require the user to do a lot of recoding and testing to successfully map ‘predicate’ and ‘how’, swap ‘lsuffix’ and ‘rsuffix’ etc. and then re-order columns. It seems a pity to impose this on the userbase, rather than write it once within sjoin!

Read more comments on GitHub >

github_iconTop Results From Across the Web

geopandas.sjoin
Spatial join of two GeoDataFrames. See the User Guide page Merging Data for details. The type of join:
Read more >
GEOPANDAS .sjoin 'index_left' and 'index_right' cannot be ...
I am trying to make a spatial join of ...
Read more >
Spatial Join Techniques
A typical spatial join technique consists of the following components: partitioning the data, performing internal memory spatial joins on subsets of the data, ......
Read more >
Efficient spatial data partitioning for distributed $$k$$ k NN joins
A ready to use kNN spatial join query for Apache Spark available on GitHub. A thorough experimental runtime and accuracy study using real-world ......
Read more >
Spatial Support - an overview | ScienceDirect Topics
While being a data-partitioning approach to spatial indexing, unlike octrees and ... Users' needs for ample control over variable aspects of inquiry ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found