Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Get warning even though exporting non-geodata to parquet

See original GitHub issue

I use both geodata and normal data in the same python file.

Context: As I work with point data, I transform the data set to normal pandas by dropping the geometry and saving LAT/LON.

e.g.

out = out.drop("geometry", axis=1)
out.to_parquet("bla.parqet")

Even though this is not a geodataframe any more I receive the warning:

C:\Users\rados\Documents\adv\point_linkage\src\data_management\merge_shapefiles.py:249: UserWarning: this is an initial implementation of Parquet/Feather file support and associated metadata.  This is tracking version 0.1.0 of the metadata specification at https://github.com/geopandas/geo-arrow-spec

This metadata specification does not yet make stability promises.  We do not yet recommend using this in a production setting unless you are able to rewrite your Parquet/Feather files.

Does this have any implication for my case?

Issue Analytics

State:
Created 3 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

2reactions

jorisvandenbosschecommented, Oct 12, 2020

@raholler Thanks a lot for your report!

I moved your issue to the geopandas repo, as it’s related to the implementation of geopandas.

So what is happening is that the drop method removes the geometry column, but the result is (incorrectly) still a GeoDataFrame, and thus using the to_parquet implementation of GeoPandas instead of pandas.

And it seems that our implementation doesn’t really correctly handle the case of no geometry column:

In [29]: df = geopandas.read_file(geopandas.datasets.get_path("naturalearth_cities"))

In [30]: df = df.drop("geometry", axis=1)

In [31]: type(df)
Out[31]: geopandas.geodataframe.GeoDataFrame

In [32]: df.to_parquet("test_no_geometry_column.parquet")
UserWarning: this is an initial implementation ....

In [33]: import pyarrow.parquet as pq

In [34]: meta = pq.read_metadata("test_geo.parquet")

In [36]: meta.metadata[b'geo']
Out[36]: b'{"primary_column": "geometry", "columns": {}, "schema_version": "0.1.0", "creator": {"library": "geopandas", "version": "0.8.0+48.g1e975ab"}}'

So we included incorrect metadata in the parquet file in this case.

@raholler short-term work-around is converting the result of drop() explicitly to a DataFrame (with pd.DataFrame(df.drop(...)))

0reactions

jorisvandenbosschecommented, Aug 2, 2022

Given the changes with that we still keep it as a GeoDataFrame if there is any geometry column, and thus more explicitly allow a GeoDataFrame without active geometry column, we should probably still test that case (to ensure we write this correctly, and can read the resulting file)

Top Results From Across the Web

EXPORT TO PARQUET - Vertica

This operation exports raw Flex columns as binary data. Output Files. EXPORT TO PARQUET always creates the output directory, even if the query...

Python: save pandas data frame to parquet file - Stack Overflow

Simple method to write pandas dataframe to parquet. Assuming, df is the pandas dataframe. We need to import following libraries. import pyarrow as...

Public - Cheat Sheet to get a working version of parquet tools ...

Sometimes, we may find issues related with NOS parquet files which we need to ... export PATH=/spare/mp185032/parquet-tools/apache-maven-3.8.4/bin:$PATH

Inquiry on exporting sap hana data to parquet files

I'm currently trying to export the data to a hana specified dev directory where I can currently export CSV files using "export into"....

Parquet file | Databricks on AWS

Learn how to read data from Apache Parquet files using Databricks. ... Interact with external data on Databricks; Parquet file ...