RFC: Bulk geometry creation
See original GitHub issueThere have been quite some issues revolving around the geometry-creation functions:
#138 - Create geometries with different number of points in a vectorized way #149 - Allow creating empty geometries in constructive functions
And the inverse of this, disassembling geometries into ndarrays:
#75 - ENH: get coordinates + offset arrays for “ragged array” representation
#93 - Add cython algo to get offset arrays for flat coordinates representation
#127 - Function to “flatten” or “explode” multi-geometries
#128 - Function to return all parts of a multi-geometry as ndarray
#197 - Cython + implement get_parts
The current geometry creation API was modeled after shapely’s constructors. Now that we are integrating shapely and pygeos, I think this similarity will become less important. Shapely will keep wrapping the low level functions of pygeos (or later: shapely itself). So at this point I think it is a good idea to make an RFC for the geometry creating functions in pygeos.
General It mostly boils down to: how to you represent a ragged array in numpy? I think we have two options:
- An ndarray of python lists. While not optimal in terms of performance, it is still sufficiently performant for pygeos to support this. Lists are actually not that bad.
- We have discussed the (coordinates, indices) approach extensively in the context of
get_parts
(see #128). This seems to be the most optimal approach.
Other considerations:
- We can easily support multiple creation algorithms.
- Each function needs an inverse (like
get_parts
). Naming proposal: prefix the function withunpack_
, and changefrom
toto
. - We have to consider 2D and 3D
- We have to consider empty geometry creation
Points
Create an n-dimensional array of 2D/3D points:
Call signature: points(coordinates : ndarray[shape=(..., 2/3), dtype=float])
>>> points(np.array([[0, 0], [0, 1]]))
ndarray([<Geometry POINT (0 0)>, <Geometry POINT (0 1)>])
>>> points(np.array([[0, 0, 0], [0, 1, 1]]))
ndarray([<Geometry POINT Z (0 0 0)>, <Geometry POINT Z (0 1 1)>])
Create an n-dimensional array of mixed points from lists
Call signature: points_from_lists(coordinates : ndarray[dtype=list])
>>> points_from_lists([[0, 0], [0, 1, 1], []]))
ndarray([<Geometry POINT (0 0)>, <Geometry POINT Z (0 1 1)>, <Geometry POINT EMPTY>])
Linestrings and linearrings
Create a 1-dimensional array of 2D/3D linestrings:
Call signature: linestrings_1d(coordinates : ndarray[N, 2/3, dtype=float], indices : ndarray[N, dtype=int])
>>> linestrings_1d(np.array([[0, 0], [0, 1], [0, 0], [0, 2]]), np.array([0, 0, 1, 1]))
ndarray([<Geometry LINESTRING (0 0, 0 1)>, <Geometry LINESTRING (0 0, 0 2)>])
Create an n-dimensional array of 2D/3D linestrings of fixed length M:
Call signature: linestrings_equal_size(coordinates : ndarray[..., M, 2/3, dtype=float])
>>> linestrings_equal_size(np.array([[[0, 0], [0, 1]], [[0, 0], [0, 2]]]))
ndarray([<Geometry LINESTRING (0 0, 0 1)>, <Geometry LINESTRING (0 0, 0 2)>])
Create an n-dimensional array of mixed linestrings from lists of points:
Call signature: linestrings_from_points(coordinates : ndarray[N, dtype=list of Points], ndim : int = 2)
>>> linestrings_from_points(ndarray([[<Geometry POINT (0 0)>, <Geometry POINT Z (0 1 1)>]]))
ndarray([<Geometry LINESTRING (0 0, 0 1)>])
>>> linestrings_from_points(ndarray([[<Geometry POINT (0 0)>, <Geometry POINT Z (0 1 1)>]]), ndim=3)
ndarray([<Geometry LINESTRING Z (0 0 nan, 0 1 1)>])
Polygons Without holes: same as linestrings / linearrings With holes: same as collections
Collections
Create a 1-dimensional array of collections
Call signature: collections_1d(geometries : ndarray[N, dtype=Geometry], indices : ndarray[N, dtype=int], collection_type : GeometryType = GeometryType.GEOMETRYCOLLECTION, ndim : int = 2)
>>> collections_1d(np.array([<Geometry POINT (0 0)>, <Geometry LINESTRING (0 0, 0 1)>, <Geometry LINESTRING (0 0, 0 2)>]), np.array([0, 1, 1]))
ndarray([<Geometry GEOMETRYCOLLECTION (POINT (0 0))>, <Geometry GEOMETRYCOLLECTION (LINESTRING (0 0, 0 1), LINESTRING (0 0, 0 2))>])
Create an n-dimensional array of collections of size M
Call signature: collections_equal_size(geometries : ndarray[..., M, dtype=Geometry], collection_type : GeometryType = GeometryType.GEOMETRYCOLLECTION, ndim : int =2)
>>> collections_equal_size(np.array([[<Geometry POINT (0 0)>, <Geometry POINT (0 1)>], [<Geometry POINT (1 0)>, <Geometry POINT (1 1)>]]))
ndarray([<Geometry GEOMETRYCOLLECTION (POINT (0 0), POINT (0 1))>, <Geometry GEOMETRYCOLLECTION (POINT (1 0), POINT (1 1))>])
Create an n-dimensional array of collections from lists of geometries
Call signature: collections_from_lists(geometries : ndarray[..., dtype=list of Geometry], collection_type : GeometryType = GeometryType.GEOMETRYCOLLECTION, ndim : int = 2)
>>> collections_from_lists([[<Geometry POINT (0 0)>, <Geometry POINT (0 1)>], [<Geometry POINT (1 0)>], []]))
ndarray([<Geometry GEOMETRYCOLLECTION (POINT (0 0), POINT (0 1))>, <Geometry GEOMETRYCOLLECTION (POINT (1 0))>, GEOMETRYCOLLECTION EMPTY])
Empty geometries
Create an n-dimensional array of empty points
Call signature: empty(shape : tuple or None = None, geom_type : enum = GeometryType.POINT, ndim : int = 2)
>>> empty()
<Geometry POINT EMPTY>
>>> empty(ndim=3)
<Geometry POINT Z EMPTY>
>>> empty(geom_type=GeometryType.LINESTRING)
<Geometry LINESTRING EMPTY>
>>> empty((2, ))
ndarray([<Geometry POINT EMPTY>, <Geometry POINT EMPTY>])
@jorisvandenbossche @brendan-ward What do you think about these proposals?
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
@Aniwax I am certainly positive on the implementation of
pygeos.empty(shape: tuple, geom_type: GeometryType)
.In the meantime you could use
pygeos.polygons(None)
. As of pygeos 0.10, this will yield an empty polygon.The remaining work on
pygeos.empty
is covered by #149 - closing.