question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ConversionError: Could not convert DataFrame to Parquet. | After upgrate to 0.16.0

See original GitHub issue

Environment details

  • OS type and version: Windows 10 x64
  • Python version: 3.8.5
  • pip version: 20.2.4
  • pandas-gbq version: 0.16.0

Steps to reproduce

This code was executed in previous version of pandas-gbq (0.15.0) and was successfully executed.

Code example


import os
import pandas as pd
import pandas_gbq as gbq

table_schema = [{
    "name": "id", 
    "type": "INTEGER"
},{
    "name": "nombre", 
    "type": "STRING"
},{
    "name": "precio", 
    "type": "NUMERIC"
},{
    "name": "fecha", 
    "type": "DATE"
}]

data = pd.DataFrame({'id': [123],'nombre': ['Anderson'],'precio': [1.25],'fecha': ['2021-12-12']})

project_id = 'proyecto111'
table_name = 'prueba.clientes'

gbq.to_gbq(data, table_name, project_id, if_exists='append', table_schema = table_schema)

Stack trace

---------------------------------------------------------------------------
ArrowInvalid                              Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas_gbq\load.py in load_parquet(client, dataframe, destination_table_ref, location, schema)
     74     try:
---> 75         client.load_table_from_dataframe(
     76             dataframe, destination_table_ref, job_config=job_config, location=location,

~\anaconda3\lib\site-packages\google\cloud\bigquery\client.py in load_table_from_dataframe(self, dataframe, destination, num_retries, job_id, job_id_prefix, location, project, job_config, parquet_compression, timeout)
   2650 
-> 2651                     _pandas_helpers.dataframe_to_parquet(
   2652                         dataframe,

~\anaconda3\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py in dataframe_to_parquet(dataframe, bq_schema, filepath, parquet_compression, parquet_use_compliant_nested_type)
    585     bq_schema = schema._to_schema_fields(bq_schema)
--> 586     arrow_table = dataframe_to_arrow(dataframe, bq_schema)
    587     pyarrow.parquet.write_table(

~\anaconda3\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py in dataframe_to_arrow(dataframe, bq_schema)
    528         arrow_arrays.append(
--> 529             bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field)
    530         )

~\anaconda3\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py in bq_to_arrow_array(series, bq_field)
    289         return pyarrow.StructArray.from_pandas(series, type=arrow_type)
--> 290     return pyarrow.Array.from_pandas(series, type=arrow_type)
    291 

~\anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib.Array.from_pandas()

~\anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib.array()

~\anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib._ndarray_to_array()

~\anaconda3\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()

ArrowInvalid: Got bytestring of length 8 (expected 16)

The above exception was the direct cause of the following exception:

ConversionError                           Traceback (most recent call last)
<ipython-input-5-49cafceeee1e> in <module>
     24 table_name = 'prueba.clientes'
     25 
---> 26 gbq.to_gbq(data, table_name, project_id, if_exists='append', table_schema = table_schema)

~\anaconda3\lib\site-packages\pandas_gbq\gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, reauth, if_exists, auth_local_webserver, table_schema, location, progress_bar, credentials, api_method, verbose, private_key)
   1093         return
   1094 
-> 1095     connector.load_data(
   1096         dataframe,
   1097         destination_table_ref,

~\anaconda3\lib\site-packages\pandas_gbq\gbq.py in load_data(self, dataframe, destination_table_ref, chunksize, schema, progress_bar, api_method)
    544 
    545         try:
--> 546             chunks = load.load_chunks(
    547                 self.client,
    548                 dataframe,

~\anaconda3\lib\site-packages\pandas_gbq\load.py in load_chunks(client, dataframe, destination_table_ref, chunksize, schema, location, api_method)
    164 ):
    165     if api_method == "load_parquet":
--> 166         load_parquet(client, dataframe, destination_table_ref, location, schema)
    167         # TODO: yield progress depending on result() with timeout
    168         return [0]

~\anaconda3\lib\site-packages\pandas_gbq\load.py in load_parquet(client, dataframe, destination_table_ref, location, schema)
     77         ).result()
     78     except pyarrow.lib.ArrowInvalid as exc:
---> 79         raise exceptions.ConversionError(
     80             "Could not convert DataFrame to Parquet."
     81         ) from exc

ConversionError: Could not convert DataFrame to Parquet.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
londosocommented, Nov 12, 2021

Hi Tim, as you said, I’m trying to write a float into a NUMERIC bq data type. Using the argument api_method="load_csv" works fine for me.

Thank you.

0reactions
tswastcommented, Nov 17, 2021

Lowering the priority since there’s a workaround of api_method="load_csv".

Read more comments on GitHub >

github_iconTop Results From Across the Web

python-bigquery-pandas/load.py at main · googleapis ... - GitHub
Convert to a BytesIO buffer so that unicode text is properly handled. ... "Could not convert DataFrame to Parquet." ) from exc ......
Read more >
Pyarrow Find bad lines in csv to parquet conversion
From the looks of the error, it seems like appropriate column data can't be converted to String type because of invalid utf-8 chars...
Read more >
Apache Arrow 6.0.1 (2021-11-18)
Table hangs indefinitely on Windows; ARROW-11518 - [C++][Parquet] Fix ... do_put close method after write_table did not throw flight error ...
Read more >
polars::frame::DataFrame - Rust - Docs.rs
Create a 2D ndarray::Array from this DataFrame . This requires all columns in the DataFrame to be non-null and numeric. They will be...
Read more >
Source - GitHub
... [R] arrow 5.0.0 write_parquet throws error writing grouped data.frame ... pyarrow client do_put close method after write_table did not throw flight ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found