question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Follow-up - Refactor cythonize geometry series operations

See original GitHub issue

UPDATE: the cython effort has been shifted to the PyGEOS package (to be integrated into Shapely), and a PR has recently landed in master with optional support for that (will be released as GeoPandas 0.8). See https://github.com/geopandas/geopandas/pull/1154 and https://geopandas.readthedocs.io/en/latest/install.html#using-the-optional-pygeos-dependency for some docs.


I merged https://github.com/geopandas/geopandas/pull/467/ in https://github.com/geopandas/geopandas/pull/472

Status: an initial implementation of the refactor (https://github.com/geopandas/geopandas/pull/467) has been merged in the geopandas-cython branch, leaving master currently as the ‘stable’ branch. Further improvements can be done by PR, but targeting this geopandas-cython branch (when you open a PR, you can choose the base branch).


A bit more background on the new implementation we are trying out: we made a vectorized geometry object GeometryArray (array-like with vectorized operations) in cython in geopandas. This vectorized geometry object only holds the integer pointers as its data, and only boxes it to shapely objects when the user accesses eg a single element, or iterates over it, … This makes it fast and cheaper to construct.

To integrate this in the GeoDataFrame and GeoSeries, we implemented a new GeometryBlock (‘blocks’ are the internal building block of pandas for the different columns). The reason we need a custom GeometryBlock, is because we need to have a way to let pandas know the data are not just normal integers we store in the dataframe (it are pointers to geometry objects), and cannot be manipulated as it were integers.


Some known to do items:

  • fix remaining failings tests
  • make installation / building easier (eg automatically finding geos location -> https://github.com/geopandas/geopandas/pull/489)
  • some changes will be needed to pandas (eg to support concat)
  • implement cythonized/vectorized io functionality (shapefiles, geojson, x/y from csv/df)
  • create an asv benchmark suite to track progress / improvement over master (this should first be merged in master) -> https://github.com/geopandas/geopandas/pull/497
  • update conda recipe (maybe we can use conda-forge to provide ‘beta’ builds)
  • get appveyor working to test on windows
  • add a GeometryArray.unique method (then GeoSeries.unique will work automatically)

cc @mrocklin @sgillies @kjordahl @jdmcbr @kuanb @eriknw

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:4
  • Comments:68 (53 by maintainers)

github_iconTop GitHub Comments

2reactions
jorisvandenbosschecommented, Feb 19, 2019

The latest pandas release (0.24.0) moved some internal classes, so we will need to update the imports here.

If you manually change the import (in geopandas/_block.py) of

from pandas.core.internals import Block, NonConsolidatableMixIn, BlockManager

to

from pandas.core.internals import Block, BlockManager
from pandas.core.internals.blocks import NonConsolidatableMixIn

then I think it should still work (although there might be other things that changed as well).

@webturtles happy to further assist if you want to try out this branch! (I need to update the branch with the latest changes in both pandas and geopandas)

1reaction
jorisvandenbosschecommented, Apr 6, 2020

@waylonflinn thanks for trying it out! But, in the meantime, the cython effort has been shifted to the pygeos package, and a PR has recently landed in master with optional support for that. See https://github.com/geopandas/geopandas/pull/1154 and https://geopandas.readthedocs.io/en/latest/install.html#using-the-optional-pygeos-dependency for some docs. So it’s best to try that out to test the latest work on better performance.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Vectorized shapely operations using Cython
_geom (yellow colored line), where I access the geometry python object ( a is an object dtyped array) and get the _geom attribute....
Read more >
Geopandas Error While Trying To Union Geometries - ADocLib
Returns True for all aligned geometries that overlap other, else False. ... Follow-up - Refactor cythonize geometry series operations Picking the right tool ......
Read more >
HISTORY.txt
In addition most of graph.py was refactored and cleaned up, ... between numpy and in-place operations (Robert Bradshaw) #1039: Dokchitser L-series of number ......
Read more >
Release Notes — NumPy v1.16 Manual
This release has seen a lot of refactoring and features many bug fixes, improved code organization, and better cross platform compatibility.
Read more >
Operator-informed machine learning: Extracting geometry and ...
Extracting geometry and dynamics from time series data ... However, this operation may not be well- ... A natural follow-up operation to the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found