ConversionError: Could not convert DataFrame to Parquet. | After upgrate to 0.16.0
See original GitHub issueEnvironment details
- OS type and version: Windows 10 x64
- Python version: 3.8.5
- pip version: 20.2.4
pandas-gbq
version: 0.16.0
Steps to reproduce
This code was executed in previous version of pandas-gbq (0.15.0) and was successfully executed.
Code example
import os
import pandas as pd
import pandas_gbq as gbq
table_schema = [{
"name": "id",
"type": "INTEGER"
},{
"name": "nombre",
"type": "STRING"
},{
"name": "precio",
"type": "NUMERIC"
},{
"name": "fecha",
"type": "DATE"
}]
data = pd.DataFrame({'id': [123],'nombre': ['Anderson'],'precio': [1.25],'fecha': ['2021-12-12']})
project_id = 'proyecto111'
table_name = 'prueba.clientes'
gbq.to_gbq(data, table_name, project_id, if_exists='append', table_schema = table_schema)
Stack trace
---------------------------------------------------------------------------
ArrowInvalid Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas_gbq\load.py in load_parquet(client, dataframe, destination_table_ref, location, schema)
74 try:
---> 75 client.load_table_from_dataframe(
76 dataframe, destination_table_ref, job_config=job_config, location=location,
~\anaconda3\lib\site-packages\google\cloud\bigquery\client.py in load_table_from_dataframe(self, dataframe, destination, num_retries, job_id, job_id_prefix, location, project, job_config, parquet_compression, timeout)
2650
-> 2651 _pandas_helpers.dataframe_to_parquet(
2652 dataframe,
~\anaconda3\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py in dataframe_to_parquet(dataframe, bq_schema, filepath, parquet_compression, parquet_use_compliant_nested_type)
585 bq_schema = schema._to_schema_fields(bq_schema)
--> 586 arrow_table = dataframe_to_arrow(dataframe, bq_schema)
587 pyarrow.parquet.write_table(
~\anaconda3\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py in dataframe_to_arrow(dataframe, bq_schema)
528 arrow_arrays.append(
--> 529 bq_to_arrow_array(get_column_or_index(dataframe, bq_field.name), bq_field)
530 )
~\anaconda3\lib\site-packages\google\cloud\bigquery\_pandas_helpers.py in bq_to_arrow_array(series, bq_field)
289 return pyarrow.StructArray.from_pandas(series, type=arrow_type)
--> 290 return pyarrow.Array.from_pandas(series, type=arrow_type)
291
~\anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib.Array.from_pandas()
~\anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib.array()
~\anaconda3\lib\site-packages\pyarrow\array.pxi in pyarrow.lib._ndarray_to_array()
~\anaconda3\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()
ArrowInvalid: Got bytestring of length 8 (expected 16)
The above exception was the direct cause of the following exception:
ConversionError Traceback (most recent call last)
<ipython-input-5-49cafceeee1e> in <module>
24 table_name = 'prueba.clientes'
25
---> 26 gbq.to_gbq(data, table_name, project_id, if_exists='append', table_schema = table_schema)
~\anaconda3\lib\site-packages\pandas_gbq\gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, reauth, if_exists, auth_local_webserver, table_schema, location, progress_bar, credentials, api_method, verbose, private_key)
1093 return
1094
-> 1095 connector.load_data(
1096 dataframe,
1097 destination_table_ref,
~\anaconda3\lib\site-packages\pandas_gbq\gbq.py in load_data(self, dataframe, destination_table_ref, chunksize, schema, progress_bar, api_method)
544
545 try:
--> 546 chunks = load.load_chunks(
547 self.client,
548 dataframe,
~\anaconda3\lib\site-packages\pandas_gbq\load.py in load_chunks(client, dataframe, destination_table_ref, chunksize, schema, location, api_method)
164 ):
165 if api_method == "load_parquet":
--> 166 load_parquet(client, dataframe, destination_table_ref, location, schema)
167 # TODO: yield progress depending on result() with timeout
168 return [0]
~\anaconda3\lib\site-packages\pandas_gbq\load.py in load_parquet(client, dataframe, destination_table_ref, location, schema)
77 ).result()
78 except pyarrow.lib.ArrowInvalid as exc:
---> 79 raise exceptions.ConversionError(
80 "Could not convert DataFrame to Parquet."
81 ) from exc
ConversionError: Could not convert DataFrame to Parquet.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5
Top Results From Across the Web
python-bigquery-pandas/load.py at main · googleapis ... - GitHub
Convert to a BytesIO buffer so that unicode text is properly handled. ... "Could not convert DataFrame to Parquet." ) from exc ......
Read more >Pyarrow Find bad lines in csv to parquet conversion
From the looks of the error, it seems like appropriate column data can't be converted to String type because of invalid utf-8 chars...
Read more >Apache Arrow 6.0.1 (2021-11-18)
Table hangs indefinitely on Windows; ARROW-11518 - [C++][Parquet] Fix ... do_put close method after write_table did not throw flight error ...
Read more >polars::frame::DataFrame - Rust - Docs.rs
Create a 2D ndarray::Array from this DataFrame . This requires all columns in the DataFrame to be non-null and numeric. They will be...
Read more >Source - GitHub
... [R] arrow 5.0.0 write_parquet throws error writing grouped data.frame ... pyarrow client do_put close method after write_table did not throw flight ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi Tim, as you said, I’m trying to write a float into a NUMERIC bq data type. Using the argument
api_method="load_csv"
works fine for me.Thank you.
Lowering the priority since there’s a workaround of
api_method="load_csv"
.