Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: New param [use_nullable_dtypes] of pd.read_parquet() can't handle empty parquet file

See original GitHub issue

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example


df_pq = pd.read_parquet(x, use_nullable_dtypes = True)

Problem description

Get error when add the new parameter use_nullable_dtypes to pd.read_parquet(). If remove it , everything go back to normal. OS: Ubuntu 16 Python: 3.8

A empty parquet file from spark causes the problem. Its schema is:

Authors,AuthorId,int64 Authors,Rank,int32 Authors,NormalizedName,string Authors,DisplayName,string Authors,LastKnownAffiliationId,int64 Authors,PaperCount,int64 Authors,PaperFamilyCount,int64 Authors,CitationCount,int64 Authors,CreatedDate,date32[day]

error msg:

df_pq = pd.read_parquet(x,use_nullable_dtypes = True)

File “/vjan/lib/python3.8/site-packages/pandas/io/parquet.py”, line 459, in read_parquet return impl.read( File “/vjan/lib/python3.8/site-packages/pandas/io/parquet.py”, line 221, in read return self.api.parquet.read_table( File “pyarrow/array.pxi”, line 751, in pyarrow.lib._PandasConvertible.to_pandas File “pyarrow/table.pxi”, line 1668, in pyarrow.lib.Table._to_pandas File “/vjan/lib/python3.8/site-packages/pyarrow/pandas_compat.py”, line 792, in table_to_blockmanager blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes) File “/vjan/lib/python3.8/site-packages/pyarrow/pandas_compat.py”, line 1133, in _table_to_blocks return [_reconstruct_block(item, columns, extension_columns) File “/vjan/lib/python3.8/site-packages/pyarrow/pandas_compat.py”, line 1133, in <listcomp> return [_reconstruct_block(item, columns, extension_columns) File “/vjan/lib/python3.8/site-packages/pyarrow/pandas_compat.py”, line 751, in _reconstruct_block pd_ext_arr = pandas_dtype.from_arrow(arr) File “/vjan/lib/python3.8/site-packages/pandas/core/arrays/integer.py”, line 121, in from_arrow return IntegerArray._concat_same_type(results) File “/vjan/lib/python3.8/site-packages/pandas/core/arrays/masked.py”, line 271, in _concat_same_type data = np.concatenate([x._data for x in to_concat]) File “<array_function internals>”, line 5, in concatenate ValueError: need at least one array to concatenate

Expected Output

read the empty parquet file and generate an empty df

Output of `pd.show_versions()`

1.2.4

Issue Analytics

State:
Created 2 years ago
Comments:7 (6 by maintainers)

Top GitHub Comments

1reaction

nakatomotoicommented, Sep 1, 2021

take

1reaction

simonjayhawkinscommented, Aug 25, 2021

Thanks @nakatomotoi. pandas has a test suite that is run on ci when a PR is opened. This issue requires a test to be added to the test suite so that we can close the issue knowing that future similar regressions should be less likely.

see https://github.com/pandas-dev/pandas/issues?q=is%3Aissue+is%3Aclosed+label%3A"Needs+Tests" for issues like this that have been closed and check out the associated PRs for insipiration.

The developer guide is https://pandas.pydata.org/pandas-docs/dev/development/index.html

Top Results From Across the Web

Error reading empty parquet file as pandas DataFrame

I found a solution to handle that but is there a more elegant way than that? df = df.loc[[]] # instead of df.loc[[],...

pandas.read_parquet — pandas 1.5.2 documentation

Load a parquet object from the file path, returning a DataFrame. Parameters ... PathLike[str] ), or file-like object implementing a binary read() function....

Parquet Files - Spark 3.3.1 Documentation

Spark SQL provides support for both reading and writing Parquet files that automatically preserves the schema of the original data.

Solved: Spark 2 Can't write dataframe to parquet table

Solved: I'm trying to write a dataframe to a parquet hive table and keep getting an error saying that the table - 61712....

Using the Parquet format in AWS Glue

You can use AWS Glue to read Parquet files from Amazon S3 and from streaming sources as well as write Parquet files to...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

BUG: New param [use_nullable_dtypes] of pd.read_parquet() can't handle empty parquet file

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of `pd.show_versions()`

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

BUG: specifying fill_value in pandas.DataFrame.shift() messes with index of empty dataframes

BUG: Assigning back dataframe with dropna does not work, but works with inplace = True

BUG: New param [use_nullable_dtypes] of pd.read_parquet() can't handle empty parquet file

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

BUG: specifying fill_value in pandas.DataFrame.shift() messes with index of empty dataframes

BUG: Assigning back dataframe with dropna does not work, but works with inplace = True

Output of `pd.show_versions()`