question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

performance regression and future of shapely.vectorized?

See original GitHub issue

Expected behavior and actual behavior.

I noticed that shapely.vectorized.contains is considerably slower (~5x) in shapely 2.0 than 1.8. Now this function may be superfluous in the new version but then maybe it should be deprecated?

(I do plan to use an STRtree but still wanted to bring it up.)

Steps to reproduce the problem.

import shapely
import numpy as np
import shapely.geometry
import shapely.vectorized

p = shapely.geometry.box(30, 20, 60, 60)

lon = np.arange(0, 360, 1)
lat = np.arange(90, -91, -1)
LON, LAT = np.meshgrid(lon, lat)

print(shapely.__version__)
%timeit shapely.vectorized.contains(p, LON.flatten(), LAT.flatten())
1.8.2
4.76 ms ± 49.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.0a1
25.5 ms ± 891 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Operating system

Linux Mint 20.3

Shapely version and provenance

conda install

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
jorisvandenbosschecommented, Sep 5, 2022

Looking a bit in more detail at the profile I mentioned above (if you open the figure below in a browser, it is a bit interactive):

test_shapely2

We can see that actually the for a large part the slowdown is due to the destruction (dealloc) of the temporary point objects. And this in itself is for a large part due to the fact that this happens on an object-by-object basis by Python (so this is not a vectorized method), and thus also initializes/destroys a GEOS context for destructing each Point object. This seems to give a lot of overhead in this case, and it is also explicitly this case of object deallocation that was mentioned in an issue about reintroducing a global GEOS context.

As a quick illustration, I adapted the GeometryObject_dealloc function in C to use a global GEOS context instead of initializing/destructing one (based on what we had before https://github.com/pygeos/pygeos/pull/113), and then tested the following snippet:

import shapely
import numpy as np

arr = np.random.randn(1_000_000, 2)


def create_destruct_points(arr):
    shapely.points(arr)

%timeit create_destruct_points(arr)

So we are creating a million points, and directly deallocating them again. Timing this function on main vs the small patch, this gives around 300ms vs 150ms. So half of the time here is due to the GEOS context initialization/destruction inside GeometryObject_dealloc. That’s probably a good reason to consider a global GEOS context again (or specifically for this method).

cc @caspervdw

1reaction
jorisvandenbosschecommented, Oct 3, 2022

For reference, GEOS recently added functions for contains and intersects that are specialized for x/y coordinates instead of taking a geometry, and so this can exactly replace the functionality of the current shapely.vectorized module (keeping the performance similar). I opened a PR for this -> https://github.com/shapely/shapely/pull/1548

Read more comments on GitHub >

github_iconTop Results From Across the Web

Depercate the shapely.vectorized module · Issue #1630 - GitHub
The shapely.vectorized module contains two functions contains(geometry, x, ... performance regression and future of shapely.vectorized?
Read more >
Version 2.x — Shapely 2.0.0 documentation - Read the Docs
Shapely 2.0 version is a major release featuring a complete refactor of the internals and new vectorized (element-wise) array operations, ...
Read more >
Shapely Documentation - Read the Docs
Shapely is a BSD-licensed Python package for manipulation and analysis of planar geometric objects. It is using.
Read more >
shapely Changelog - pyup.io
Shapely version 2.0.0 is a major release featuring a complete refactor of the internals and new vectorized (element-wise) array operations providing
Read more >
Vectorized shapely operations using Cython
from shapely.geometry import Point, Polygon ... The main performance problem with this vectorized 'contains' function is that the for loop ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found