ENH: Build line topology in dataframe for simple integration with NetworkX (+ potentially other graph systems)
See original GitHub issueHello there
I’ve been using geopandas for some time now, as well as The NetworkX library for doing graph operations. NetworkX has for a while now had their own nx.from_shp
function for reading shapefiles and creating graphs out of them, but not much geospatial functionality beyond that. It is also based on raw GDAL which is apparently a bit of a pain for them to maintain dependencies for
They also have a function called nx.from_pandas_edgelist
which can build a graph out of a pandas DataFrame so long as it has columns indicating the source and target of each edge. I made a pull request to their repo with some code which can build a graph from a geopandas GeoDataFrame, so long as it only has LineString geometries. In the end, however, we realized that since gpd.GeoDataFrmae
inherits from pd.DataFrame
, it already technically compatible with nx.from_pandas_edgelist
, it just needs to have the source and target columns pre-computed.
Some maintainers of NetworkX have agreed that shedding responsibility for handling geospatial I/O would be a relief to their project and I think that geopandas is in a good position to fill this void, given that it can already read & write from just about every GIS format and is already compatible with their support for pandas DataFrames
So I propose that I add a function to [geodataframe.py](https://github.com/geopandas/geopandas/blob/master/geopandas/geodataframe.py called make_topology
which would construct source
and target
columns with unique ids where the nodes ought to be. Of course, the column names would be customizable and a spatial tolerance would be a parameter. Furthermore, it may also be interesting to have a way to somehow extract a point-based “nodes” GeoDataFrame for visualization of the identified nodes, but I’m not sure what the best way would be to do that. So the signature would look something like:
gdf.make_topology(source_col="source", target_col="target", precision=0.005, inplace=False)
I didn’t plan to do any topological cleaning or modification of geometry (i.e. assuming that the network is already clean), although maybe that would be an interesting set of new functions for the future.
I think this functionality would be quite useful for topological analysis of geographic networks in general, as the creation of the source & target columns would effectively make the GeoDataFrame ready for integration into NetworkX or indeed any other library or system which wants topological edge representations rather than geometric representations of features.
Here is the main bit of code which I had made for the original proposal for integration into NetworkX before we considered putting it in geopandas instead. It’s not all that complex, basically we just look at all the LineString geometries first & last points and check to see if they are unique (within the specified precision). Just note that this is not the final proposal, just a quickly-extracted and slightly modified version of what I had originally proposed in the NetworkX codebase:
def make_topology(source_col="source", target_col="target", precision=0.001 geometry="geometry"):
# Determine number of dimensions of geometry by checking the first row (i.e. 2D or 3D lines?)
dims = range(len(gdf[geometry][0].coords[0]))
# Find all unique start & end points and assign them an id
gdf["source_coords"] = gdf[geometry].apply(
lambda geom: tuple(round(geom.coords[0][i], precision) for i in dims)
)
gdf["target_coords"] = gdf[geometry].apply(
lambda geom: tuple(round(geom.coords[-1][i], precision) for i in dims)
)
node_ids = {}
i = 0
for row in gdf.itertuples(index=False):
node_1 = row.source_coords
node_2 = row.target_coords
if node_1 not in node_ids:
node_ids[node_1] = i
i += 1
if node_2 not in node_ids:
node_ids[node_2] = i
i += 1
# Assign the unique id to each
gdf[source_col] = gdf["source_coords"].apply(lambda x: node_ids[x])
gdf[target_col] = gdf["target_coords"].apply(lambda x: node_ids[x])
gdf.drop(
["source_coords", "target_coords"],
axis="columns",
inplace=True,
)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:4
- Comments:9 (4 by maintainers)
Top GitHub Comments
Building topologies from geographic data stored in
GeoDataFrames
is indeed a big interest to many. As @martinfleis mentions above, thepysal
library has also focused on providing these kinds of fast algorithms for graph/topology construction with no GEOS dependency. Though, in econometrics, these are called “spatial weights,” rather than “graphs” for historical reasons. We cover many kinds of distance-based and contiguity-based weights. I asked about these interfaces to networkx a while ago, and that stalled.So, we went ahead and did it in
libpysal
. We have convertersto_networkx()
andfrom_networkx()
, as well asto_adjlist()
andfrom_adjlist()
and access to the underlying sparse matrix representationBut… it’d be great if we could have a single library where this kind of graph building is fast and performant… ours usually is fast because “sharing a vertex” is slightly different from “touches at a point,” but building on top of pygeos might open new performance benefits.
Regardless, coordinating to build a single topology builder that has a generic builder for things with a
__geo_interface__
and possibly has faster implementations for specific input types would be really great, and I think consolidate a ton of duplicated effort in the ecosystem.For those not familiar with
pysal
, an example task might be:Edges is now a geodataframe of line segments, as if read in from file. In our lexicon, we’d use “queen contiguity” to refer to neighboring geometries that share a single vertex:
This works for polygons, too:
and, for distance-based graphs, we have a similar API:
It may be of interest how OSMnx handles NetworkX MultiDiGraph to/from Geopandas GeoDataFrame conversions with its graph_to_gdfs and graph_from_gdfs functions.
The package has undergone a major renovation over the summer and has shifted somewhat away from its original mission of being a pure OpenStreetMap -> NetworkX graph utility, towards more flexibly working with OSM spatial networks in NetworkX and OSM non-networked geometries in Geopandas (particularly with the new geometries module).