question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Get warning even though exporting non-geodata to parquet

See original GitHub issue

I use both geodata and normal data in the same python file.

Context: As I work with point data, I transform the data set to normal pandas by dropping the geometry and saving LAT/LON.

e.g.

out = out.drop("geometry", axis=1)
out.to_parquet("bla.parqet")

Even though this is not a geodataframe any more I receive the warning:

C:\Users\rados\Documents\adv\point_linkage\src\data_management\merge_shapefiles.py:249: UserWarning: this is an initial implementation of Parquet/Feather file support and associated metadata.  This is tracking version 0.1.0 of the metadata specification at https://github.com/geopandas/geo-arrow-spec

This metadata specification does not yet make stability promises.  We do not yet recommend using this in a production setting unless you are able to rewrite your Parquet/Feather files.

Does this have any implication for my case?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
jorisvandenbosschecommented, Oct 12, 2020

@raholler Thanks a lot for your report!

I moved your issue to the geopandas repo, as it’s related to the implementation of geopandas.

So what is happening is that the drop method removes the geometry column, but the result is (incorrectly) still a GeoDataFrame, and thus using the to_parquet implementation of GeoPandas instead of pandas.

And it seems that our implementation doesn’t really correctly handle the case of no geometry column:

In [29]: df = geopandas.read_file(geopandas.datasets.get_path("naturalearth_cities"))

In [30]: df = df.drop("geometry", axis=1)

In [31]: type(df)
Out[31]: geopandas.geodataframe.GeoDataFrame

In [32]: df.to_parquet("test_no_geometry_column.parquet")
UserWarning: this is an initial implementation ....

In [33]: import pyarrow.parquet as pq

In [34]: meta = pq.read_metadata("test_geo.parquet")

In [36]: meta.metadata[b'geo']
Out[36]: b'{"primary_column": "geometry", "columns": {}, "schema_version": "0.1.0", "creator": {"library": "geopandas", "version": "0.8.0+48.g1e975ab"}}'

So we included incorrect metadata in the parquet file in this case.

@raholler short-term work-around is converting the result of drop() explicitly to a DataFrame (with pd.DataFrame(df.drop(...)))

0reactions
jorisvandenbosschecommented, Aug 2, 2022

Given the changes with that we still keep it as a GeoDataFrame if there is any geometry column, and thus more explicitly allow a GeoDataFrame without active geometry column, we should probably still test that case (to ensure we write this correctly, and can read the resulting file)

Read more comments on GitHub >

github_iconTop Results From Across the Web

EXPORT TO PARQUET - Vertica
This operation exports raw Flex columns as binary data. Output Files. EXPORT TO PARQUET always creates the output directory, even if the query...
Read more >
Python: save pandas data frame to parquet file - Stack Overflow
Simple method to write pandas dataframe to parquet. Assuming, df is the pandas dataframe. We need to import following libraries. import pyarrow as...
Read more >
Public - Cheat Sheet to get a working version of parquet tools ...
Sometimes, we may find issues related with NOS parquet files which we need to ... export PATH=/spare/mp185032/parquet-tools/apache-maven-3.8.4/bin:$PATH
Read more >
Inquiry on exporting sap hana data to parquet files
I'm currently trying to export the data to a hana specified dev directory where I can currently export CSV files using "export into"....
Read more >
Parquet file | Databricks on AWS
Learn how to read data from Apache Parquet files using Databricks. ... Interact with external data on Databricks; Parquet file ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found