Incorrect roundtrip of index names on filtered dataframe
See original GitHub issueWhen saving a filtered dataframe to parquet using Pandas and fastparquet the index names are round-tripped incorrectly:
import pandas as pd
df = pd._testing.makeMixedDataFrame()
filtered_df = df[df.A>=1]
filtered_df.to_parquet('test.parq', engine='fastparquet')
loaded_df = pd.read_parquet('test.parq')
print(filtered_df.index.names)
print(loaded_df.index.names)
FrozenList([None])
FrozenList(['index'])
Versions
fastparquet 0.7.2 pandas 1.3.2
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Panda :Python - Search a dataFrame by index name , return df ...
It returns a boolean array so you can use it. import numpy as np df.loc[np.isin(df.index, ['invalid', ...
Read more >IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation
Column(s) to use as the row labels of the DataFrame , either given as string name or column index. If a sequence of...
Read more >Search API | Elasticsearch Guide [8.5] | Elastic
This filter roundtrip can limit the number of shards significantly if for instance a ... (Required, float) <index> is the name of the...
Read more >Spark SQL, DataFrames and Datasets Guide
Rename of SchemaRDD to DataFrame; Unification of the Java and Scala APIs ... people older than 21 df.filter($"age" > 21).show() // +---+----+ //...
Read more >Introduction to Pandas - chryswoods.com
These operations work because the Series index selection can be passed a series of True and False values which it then uses to...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@yohplala , you are probably in a good place to ensure
None
roundtrips, if you have any interest. I can fix any tests that this causes to fail in Dask. I have the feeling the issue isn’t high priority.No rush! I might do it myself also, but I have a similar problem with finding time 😃