question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Integrating pygeos in GeoPandas for vectorized array operations

See original GitHub issue

For context, see https://github.com/geopandas/geopandas/issues/430. pygeos (https://github.com/caspervdw/pygeos/) is a new package providing all GEOS functionality as vectorized functions operating on numpy arrays.

We can use this in GeoPandas to replace the python loops over shapely objects, to provide a considerable performance boost (similar to the timings shown with the cython branch). See https://github.com/geopandas/geopandas/pull/1154 for a proof of concept.

Notable things:

  • pygeos has its own lightweight Geometry object, and it is an array of those that we store under the hood in a GeometryArray instead of an array of shapely objects
  • For me, the idea is that this is (for now at least) mostly hidden for the user, and the public interface dealing with scalar geometry objects (eg when accessing a single element from a GeoSeries) still uses the familiar, feature-rich shapely object. This means that upon access, the pygeos Geometry is converted to a Shapely geometry.
  • My proof of concept PR (https://github.com/geopandas/geopandas/pull/1154) passes all our existing tests (the only change I needed to make was changing an identity check into an equality check (as accessing a single object each time gives a new shapely object, see above)). So in theory, this should be almost fully backwards compatible.

But some questions that we need to discuss:

  • Are we OK with a hard requirements on pygeos, or do we keep the current implementation as fallback? (eg only use pygeos if it is installed)
    • Given the relatively small diff in https://github.com/geopandas/geopandas/pull/1154, and the fact that the behaviour is almost the same, it seems possible to do this opt-in (or at least initially). But it of course adds complexity as then there are multiple implementations to maintain, so it is not my preferred solution (long term).
  • Do we already want to use it now, or do we want to wait until the situation between shapely and pygeos gets cleared up? (we are still discussing to what extent it could be integrated in shapely) Given this uncertainty, that might be a reason to go for the opt-in solution for now.

Thoughts / concerns / questions about this topic?

cc @geopandas/collaborators @caspervdw @snowman2

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:14 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
jorisvandenbosschecommented, Mar 24, 2020

I created a dev package on conda-forge, making it a little bit easier to install and test the 0.8.0.dev version with those changes (although since geopandas is pure-python, installing from git master is also not hard):

conda install -c conda-forge/label/geopandas_dev -c conda-forge geopandas pygeos

gives you pygeos and the dev version of geopandas, so that pygeos should be used by geopandas.

1reaction
nickeubankcommented, Oct 19, 2019

Are we OK with a hard requirements on pygeos, or do we keep the current implementation as fallback? (eg only use pygeos if it is installed)

Full switch! The maintainer community for geopandas is small enough I don’t think doubled implementations make sense / is feasible. No one likes dependencies, but geopandas will never be lightweight anyway – I vote for accepting the dependencies in exchange for making geopandas easier to maintain / improve.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pygeos documentation - Read the Docs
PyGEOS is a C/Python library with vectorized geometry functions. The geometry operations are done in the open-source geometry library GEOS.
Read more >
Ecosystem — GeoPandas 0.12.2+0.gefcb367.dirty ...
PyGEOS is a C/Python library with vectorized geometry functions. ... ufuncs providing a performance improvement when operating on arrays of geometries.
Read more >
Introducing PyGEOS - Casper van der Wel
Arrays of geometries can be operated on with almost zero Python ... PyGEOS aims to provide vectorized geospatial operations to the Python ...
Read more >
The Best Features of Geopandas 0.8.0 Release | by Abdishakur
I like using Geopandas for my Geospatial data science projects. ... PyGEOS is a C/Python library with vectorized geometry functions.
Read more >
pygeos 0.10 - PyPI
PyGEOS is a C/Python library with vectorized geometry functions. ... a performance improvement when operating on arrays of geometries.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found