Follow-up - Refactor cythonize geometry series operations
See original GitHub issueUPDATE: the cython effort has been shifted to the PyGEOS package (to be integrated into Shapely), and a PR has recently landed in master with optional support for that (will be released as GeoPandas 0.8). See https://github.com/geopandas/geopandas/pull/1154 and https://geopandas.readthedocs.io/en/latest/install.html#using-the-optional-pygeos-dependency for some docs.
I merged https://github.com/geopandas/geopandas/pull/467/ in https://github.com/geopandas/geopandas/pull/472
Status: an initial implementation of the refactor (https://github.com/geopandas/geopandas/pull/467) has been merged in the geopandas-cython branch, leaving master currently as the ‘stable’ branch.
Further improvements can be done by PR, but targeting this geopandas-cython branch (when you open a PR, you can choose the base branch).
A bit more background on the new implementation we are trying out: we made a vectorized geometry object GeometryArray (array-like with vectorized operations) in cython in geopandas. This vectorized geometry object only holds the integer pointers as its data, and only boxes it to shapely objects when the user accesses eg a single element, or iterates over it, … This makes it fast and cheaper to construct.
To integrate this in the GeoDataFrame and GeoSeries, we implemented a new GeometryBlock (‘blocks’ are the internal building block of pandas for the different columns). The reason we need a custom GeometryBlock, is because we need to have a way to let pandas know the data are not just normal integers we store in the dataframe (it are pointers to geometry objects), and cannot be manipulated as it were integers.
Some known to do items:
- fix remaining failings tests
- make installation / building easier (eg automatically finding geos location -> https://github.com/geopandas/geopandas/pull/489)
- some changes will be needed to pandas (eg to support concat)
- implement cythonized/vectorized io functionality (shapefiles, geojson, x/y from csv/df)
- create an asv benchmark suite to track progress / improvement over master (this should first be merged in master) -> https://github.com/geopandas/geopandas/pull/497
- update conda recipe (maybe we can use conda-forge to provide ‘beta’ builds)
- get appveyor working to test on windows
- add a
GeometryArray.uniquemethod (then GeoSeries.unique will work automatically)
Issue Analytics
- State:
- Created 6 years ago
- Reactions:4
- Comments:68 (53 by maintainers)

Top Related StackOverflow Question
The latest pandas release (0.24.0) moved some internal classes, so we will need to update the imports here.
If you manually change the import (in
geopandas/_block.py) ofto
then I think it should still work (although there might be other things that changed as well).
@webturtles happy to further assist if you want to try out this branch! (I need to update the branch with the latest changes in both pandas and geopandas)
@waylonflinn thanks for trying it out! But, in the meantime, the cython effort has been shifted to the pygeos package, and a PR has recently landed in master with optional support for that. See https://github.com/geopandas/geopandas/pull/1154 and https://geopandas.readthedocs.io/en/latest/install.html#using-the-optional-pygeos-dependency for some docs. So it’s best to try that out to test the latest work on better performance.