Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ConversionError: Could not convert DataFrame to Parquet. | After upgrate to 0.16.0

See original GitHub issue

Environment details

OS type and version: Windows 10 x64
Python version: 3.8.5
pip version: 20.2.4
pandas-gbq version: 0.16.0

Steps to reproduce

This code was executed in previous version of pandas-gbq (0.15.0) and was successfully executed.

Code example


import os
import pandas as pd
import pandas_gbq as gbq

table_schema = [{
    "name": "id", 
    "type": "INTEGER"
},{
    "name": "nombre", 
    "type": "STRING"
},{
    "name": "precio", 
    "type": "NUMERIC"
},{
    "name": "fecha", 
    "type": "DATE"
}]

data = pd.DataFrame({'id': [123],'nombre': ['Anderson'],'precio': [1.25],'fecha': ['2021-12-12']})

project_id = 'proyecto111'
table_name = 'prueba.clientes'

gbq.to_gbq(data, table_name, project_id, if_exists='append', table_schema = table_schema)

Stack trace

---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas_gbq\load.py in load_parquet(client, dataframe, destination_table_ref, location, schema)
     74     try:
---> 75         client.load_table_from_dataframe(
     76             dataframe, destination_table_ref, job_config=job_config, location=location,

~\anaconda3\lib\site-packages\google\cloud\bigquery\client.py in load_table_from_dataframe(self, dataframe, destination, num_retries, job_id, job_id_prefix, location, project, job_config, parquet_compression, timeout)
   2650 
-> 2651                     _pandas_helpers.dataframe_to_parquet(
   2652                         dataframe,

~\anaconda3\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py in dataframe_to_parquet(dataframe, bq_schema, filepath, parquet_compression, parquet_use_compliant_nested_type)
    585     bq_schema = schema._to_schema_fields(bq_schema)
--> 586     arrow_table = dataframe_to_arrow(dataframe, bq_schema)
    587     pyarrow.parquet.write_table(

~\anaconda3\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py in dataframe_to_arrow(dataframe, bq_schema)
    528         arrow_arrays.append(
--> 529             bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field)
    530         )

~\anaconda3\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py in bq_to_arrow_array(series, bq_field)
    289         return pyarrow.StructArray.from_pandas(series, type=arrow_type)
--> 290     return pyarrow.Array.from_pandas(series, type=arrow_type)
    291 

~\anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib.Array.from_pandas()

~\anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib.array()

~\anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib._ndarray_to_array()

~\anaconda3\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Got bytestring of length 8 (expected 16)

The above exception was the direct cause of the following exception:

ConversionError                           Traceback (most recent call last)
<ipython-input-5-49cafceeee1e> in <module>
     24 table_name = 'prueba.clientes'
     25 
---> 26 gbq.to_gbq(data, table_name, project_id, if_exists='append', table_schema = table_schema)

~\anaconda3\lib\site-packages\pandas_gbq\gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, reauth, if_exists, auth_local_webserver, table_schema, location, progress_bar, credentials, api_method, verbose, private_key)
   1093         return
   1094 
-> 1095     connector.load_data(
   1096         dataframe,
   1097         destination_table_ref,

~\anaconda3\lib\site-packages\pandas_gbq\gbq.py in load_data(self, dataframe, destination_table_ref, chunksize, schema, progress_bar, api_method)
    544 
    545         try:
--> 546             chunks = load.load_chunks(
    547                 self.client,
    548                 dataframe,

~\anaconda3\lib\site-packages\pandas_gbq\load.py in load_chunks(client, dataframe, destination_table_ref, chunksize, schema, location, api_method)
    164 ):
    165     if api_method == "load_parquet":
--> 166         load_parquet(client, dataframe, destination_table_ref, location, schema)
    167         # TODO: yield progress depending on result() with timeout
    168         return [0]

~\anaconda3\lib\site-packages\pandas_gbq\load.py in load_parquet(client, dataframe, destination_table_ref, location, schema)
     77         ).result()
     78     except pyarrow.lib.ArrowInvalid as exc:
---> 79         raise exceptions.ConversionError(
     80             "Could not convert DataFrame to Parquet."
     81         ) from exc

ConversionError: Could not convert DataFrame to Parquet.

Issue Analytics

State:
Created 2 years ago
Comments:5

Top GitHub Comments

1reaction

londosocommented, Nov 12, 2021

Hi Tim, as you said, I’m trying to write a float into a NUMERIC bq data type. Using the argument api_method="load_csv" works fine for me.

Thank you.

0reactions

tswastcommented, Nov 17, 2021

Lowering the priority since there’s a workaround of api_method="load_csv".

Top Results From Across the Web

python-bigquery-pandas/load.py at main · googleapis ... - GitHub

Convert to a BytesIO buffer so that unicode text is properly handled. ... "Could not convert DataFrame to Parquet." ) from exc ......

Pyarrow Find bad lines in csv to parquet conversion

From the looks of the error, it seems like appropriate column data can't be converted to String type because of invalid utf-8 chars...

Apache Arrow 6.0.1 (2021-11-18)

Table hangs indefinitely on Windows; ARROW-11518 - [C++][Parquet] Fix ... do_put close method after write_table did not throw flight error ...

polars::frame::DataFrame - Rust - Docs.rs

Create a 2D ndarray::Array from this DataFrame . This requires all columns in the DataFrame to be non-null and numeric. They will be...

Source - GitHub

... [R] arrow 5.0.0 write_parquet throws error writing grouped data.frame ... pyarrow client do_put close method after write_table did not throw flight ...