performance regression and future of shapely.vectorized?
See original GitHub issueExpected behavior and actual behavior.
I noticed that shapely.vectorized.contains
is considerably slower (~5x) in shapely 2.0 than 1.8. Now this function may be superfluous in the new version but then maybe it should be deprecated?
(I do plan to use an STRtree but still wanted to bring it up.)
Steps to reproduce the problem.
import shapely
import numpy as np
import shapely.geometry
import shapely.vectorized
p = shapely.geometry.box(30, 20, 60, 60)
lon = np.arange(0, 360, 1)
lat = np.arange(90, -91, -1)
LON, LAT = np.meshgrid(lon, lat)
print(shapely.__version__)
%timeit shapely.vectorized.contains(p, LON.flatten(), LAT.flatten())
1.8.2
4.76 ms ± 49.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.0a1
25.5 ms ± 891 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Operating system
Linux Mint 20.3
Shapely version and provenance
conda install
Issue Analytics
- State:
- Created a year ago
- Comments:7 (5 by maintainers)
Top Results From Across the Web
Depercate the shapely.vectorized module · Issue #1630 - GitHub
The shapely.vectorized module contains two functions contains(geometry, x, ... performance regression and future of shapely.vectorized?
Read more >Version 2.x — Shapely 2.0.0 documentation - Read the Docs
Shapely 2.0 version is a major release featuring a complete refactor of the internals and new vectorized (element-wise) array operations, ...
Read more >Shapely Documentation - Read the Docs
Shapely is a BSD-licensed Python package for manipulation and analysis of planar geometric objects. It is using.
Read more >shapely Changelog - pyup.io
Shapely version 2.0.0 is a major release featuring a complete refactor of the internals and new vectorized (element-wise) array operations providing
Read more >Vectorized shapely operations using Cython
from shapely.geometry import Point, Polygon ... The main performance problem with this vectorized 'contains' function is that the for loop ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Looking a bit in more detail at the profile I mentioned above (if you open the figure below in a browser, it is a bit interactive):
We can see that actually the for a large part the slowdown is due to the destruction (dealloc) of the temporary point objects. And this in itself is for a large part due to the fact that this happens on an object-by-object basis by Python (so this is not a vectorized method), and thus also initializes/destroys a GEOS context for destructing each Point object. This seems to give a lot of overhead in this case, and it is also explicitly this case of object deallocation that was mentioned in an issue about reintroducing a global GEOS context.
As a quick illustration, I adapted the
GeometryObject_dealloc
function in C to use a global GEOS context instead of initializing/destructing one (based on what we had before https://github.com/pygeos/pygeos/pull/113), and then tested the following snippet:So we are creating a million points, and directly deallocating them again. Timing this function on main vs the small patch, this gives around 300ms vs 150ms. So half of the time here is due to the GEOS context initialization/destruction inside
GeometryObject_dealloc
. That’s probably a good reason to consider a global GEOS context again (or specifically for this method).cc @caspervdw
For reference, GEOS recently added functions for contains and intersects that are specialized for x/y coordinates instead of taking a geometry, and so this can exactly replace the functionality of the current
shapely.vectorized
module (keeping the performance similar). I opened a PR for this -> https://github.com/shapely/shapely/pull/1548