Impoved reading and writing from/to PostGIS (SQL in general?) support
See original GitHub issueCurrently we only have a very basic read_postgis function, and we certainly want a write function as well (https://github.com/geopandas/geopandas/issues/189). But, we currently also have some different open (overlapping) PRs and issues related to improving the IO support for PostGIS. Therefore I thought to open a new general issue to get some overview.
Open PRs:
- https://github.com/geopandas/geopandas/pull/440 PR adding to_postgis
- https://github.com/geopandas/geopandas/pull/457 PR with both read/write for postgis
- https://github.com/geopandas/geopandas/pull/546 PR to use geoalchemy in from_postgis
- one not related to postgis: https://github.com/geopandas/geopandas/pull/101 PR to add support for sqlite
Does somebody have an insight in what the main differences are between the postgis PRs? How to proceed with those?
Some questions related to this that we might need to answer:
- Do we want to use geoalchemy (https://geoalchemy-2.readthedocs.io/en/latest/)? (and thus add it as an optional requirement) What does it bring?
- Can we actually support more than PostGIS? More general SQL support? (https://github.com/geopandas/geopandas/issues/490) Eg also MySQL has spatial data data types (https://dev.mysql.com/doc/refman/5.7/en/spatial-datatypes.html) But eg geoalchemy does not seem to support that.
- Naming of the functions (https://github.com/geopandas/geopandas/issues/161): currenlty
GeoDataFrame.from_postgisandread_postgis. Depending on the question above, we might want to make it more general (read_sql,to_sql). Personally I would retire the ‘from_postgis’ forread_postgis(orread_sql) anyhow.
There is some relevant discussion in https://github.com/geopandas/geopandas/issues/161 as well.
Other related issues: https://github.com/geopandas/geopandas/issues/451 on adding SRID support in read_postgis
cc @jdmcbr @dimitri-justeau @showjackyang @adamboche @kuanb @emiliom @perrygeo @carsonfarmer
Issue Analytics
- State:
- Created 6 years ago
- Reactions:4
- Comments:19 (18 by maintainers)

Top Related StackOverflow Question
Hi @Sangarshanan and thanks for reviving this and your contributions! 👍
I now finally had time to get back to this and I went through your edits @Sangarshanan and included them to this Gist: https://gist.github.com/HTenkanen/3b214be899f0d3885bad48577de48150
I left some of the earlier parts as they were, so that the function is able to handle mix between single vs multi-geometries automatically (e.g. mix between Polygon and MultiPolygon).
@jorisvandenbossche: I now also updated the CRS reading using the new pyproj CRS class so it should work now quite nicely with different types of CRS information. In addition, I now tested swapping from shapely.wkb to pygeos.wkb as it also provided some improvements on the performance.
I did some time profiling on the different parts (see GIST) and currently the performance is as follows:
With Pygeos WKB:
And this is how long different parts take:
With Shapely WKB
And this is how long different parts take:
So as we can see, using the Pygeos instead of Shapely is twice as fast, and now most of the time goes to actually writing the data into the database which is normal.
Couple of questions for @jorisvandenbossche :
I have understood that the Pygeos things will eventually be integrated into Shapely. But I guess that might still take some time, so should we continue with this Pygeos approach, i.e. converting geometries from shapely to pygeos, and then those to wkb? Or should we stick with normal Shapely wkb-dumps for now, as this would naturally bring a new dependency to Geopandas? What are the current thoughts about integrating Pygeos to Geopandas?
I guess the most logical place to add this
to_postgis()-functionality would be in the..geopandas/io/sql.py-file, would you agree?Any recommendations / ideas about how we should test these functionalities? I see that for testing the reading from PostGIS, you have the
create_postgis()-function that populates atest_geopandasdatabase. I guess we could take a similar approach here and test that populating the nybb data works with theto_postgis()function?@HTenkanen Great job! Few notes from me.
pyproj.CRSwill be used as GeoDataFrame.crs from next release (#1101), so we will be able to clean those conditions then.I am fine with GeoAlchemy2, it is a pure python package installable from PyPI. Recent GeoPandas in not available on defaults either. @jorisvandenbossche will be able to tell more about the channels support.
The plan was to use pygeos under the hood within geopandas anyway (#1155), but I am not sure what is the current situation after the decision to merge pygeos with shapely. I am not very keen to use the logic you implemented, but once this pygeos/shapely/geopandas relation will be clearer we might come up with a simpler way.