question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pyarrow.lib.ArrowTypeError: Expected bytes, got a 'dict' object

See original GitHub issue

Hi! there is a problem when trying to load using pandas-gbq which using pyarrow a column of the list (array) or dictionary (json) type into the table, while the GBQ documentation says that structure types such as array or json are supported,

df = pd.DataFrame(
                {
                    "my_string": ["a", "b", "c"],
                    "my_int64": [1, 2, 3],
                    "my_float64": [4.0, 5.0, 6.0],
                    "my_bool1": [True, False, True],
                    "my_bool2": [False, True, False],
                    "my_struct": [{"test":"str1"},{"test":"str2"},{"test":"str3"}],
                }
            )
pandas_gbq.to_gbq(df, **gbq_params)

as a result, a stacktrace error occurs:

  • in bq_to_arrow_array
  • return pyarrow.Array.from_pandas(series, type=arrow_type)
  • File “pyarrow/array.pxi”, line 913, in pyarrow.lib.Array.from_pandas
  • File “pyarrow/array.pxi”, line 311, in pyarrow.lib.array
  • File “pyarrow/array.pxi”, line 83, in pyarrow.lib._ndarray_to_array
  • File “pyarrow/error.pxi”, line 122, in pyarrow.lib.check_status
  • pyarrow.lib.ArrowTypeError: Expected bytes, got a ‘dict’ object

Can anyone help with it please?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:5
  • Comments:5

github_iconTop GitHub Comments

2reactions
grzesircommented, Jan 18, 2022

any updates on this? getting the same error. the strange thing is that the code works well locally and in compute engine, but fails in cloud run (even though the same service account is being used for both)

0reactions
nabor-slalom-greenparksportscommented, Mar 24, 2022

Has there been any progress on updating this issue? I am seeing the same error message.

Could we elaborate on:

I believe we can avoid this problem with https://github.com/googleapis/python-bigquery-pandas/issues/339 where instead of pandas-gbq creating the table, we create the table as part of the load job.

As I am seeing the same issue even with a created table, and using (if_exists=‘replace’):

pandas_gbq.to_gbq(dataframe, table_id, project_id=project_id, if_exists='replace')

The work-around that helped me to successfully load my table was casting the dataframe column to string data type.

As an example GCP Cloud Function:

import pandas as pd
import pandas_gbq

def gbq_write(request):

  # TODO: Set project_id to your Google Cloud Platform project ID.
  project_id = "project-id"

  # TODO: Set table_id to the full destination table ID (including the dataset ID).
  table_id = 'dataset.table'

  df = pd.DataFrame(
      {
          "my_string": ["a", "b", "c"],
          "my_int64": [1, 2, 3],
          "my_float64": [4.0, 5.0, 6.0],
          "my_bool1": [True, False, True],
          "my_dates": pd.date_range("now", periods=3),
          "my_struct": [{"test":"str1"},{"test":"str2"},{"test":"str3"}],
      }
  )

  pandas_gbq.to_gbq(df, table_id, project_id=project_id, if_exists='replace')

  return f'Successfully Written'

This produces the error mentioned in this thread:

pyarrow.lib.ArrowTypeError: Expected bytes, got a 'dict' object

With requirements.txt as

pandas==1.4.1
pandas-gbq==0.17.4

When pushing the column casting I added a single line and ended up with:

import pandas as pd
import pandas_gbq

def gbq_write(request):

  # TODO: Set project_id to your Google Cloud Platform project ID.
  project_id = "project-id"

  # TODO: Set table_id to the full destination table ID (including the dataset ID).
  table_id = 'dataset.table'

  df = pd.DataFrame(
      {
          "my_string": ["a", "b", "c"],
          "my_int64": [1, 2, 3],
          "my_float64": [4.0, 5.0, 6.0],
          "my_bool1": [True, False, True],
          "my_dates": pd.date_range("now", periods=3),
          "my_struct": [{"test":"str1"},{"test":"str2"},{"test":"str3"}],
      }
  )

  # Column conversion added to load table
  df['my_struct'] = df['my_struct'].astype("string")

  pandas_gbq.to_gbq(df, table_id, project_id=project_id, if_exists='replace')

  return f'Successfully Written'

This helps to successfully load the table into BigQuery with schema:

Field name Type
my_string STRING
my_int64 INTEGER
my_float64 FLOAT
my_bool1 BOOLEAN
my_dates TIMESTAMP
my_struct STRING

If you need the my_struct to be an actual struct consider:

SELECT
  *
   # retrieve value from struct
  ,json_value(my_struct, '$.test') AS test
   # recreate struct using value for each row
  ,struct(json_value(my_struct, '$.test') AS test) AS my_created_struct
FROM `project-id.dataset.table` order by my_int64
Row my_string my_int64 my_float64 my_bool1 my_dates my_struct test my_created_struct.test
1 a 1 4.0 true 2022-03-24 04:14:28.267319 UTC {‘test’: ‘str1’} str1 str1
2 b 2 5.0 false 2022-03-25 04:14:28.267319 UTC {‘test’: ‘str2’} str2 str2
3 c 3 6.0 true 2022-03-26 04:14:28.267319 UTC {‘test’: ‘str3’} str3 str3
Read more comments on GitHub >

github_iconTop Results From Across the Web

pyarrow.lib.ArrowTypeError: "Expected a string or bytes object ...
STEP-1: Convert the pandas dataframe into pyarrow table with following line of code. table = pa.Table.from_pandas(df_image_0). STEP-2: Now, ...
Read more >
Expected bytes, got a 'int' object) - Stack Overflow
Error while converting pandas dataframe to polars dataframe (pyarrow.lib.ArrowTypeError: Expected bytes, got a 'int' object) · Ask Question.
Read more >
[#ARROW-7663] [Python] from_pandas gives TypeError ...
from_pandas sometimes raises a TypeError with an uninformative error message rather than an ArrowTypeError with the full, informative type error for ...
Read more >
Pandas to_gbq() TypeError "Expected bytes, got a 'int' object ...
I keep getting an ArrowTypeError: Expected bytes, got a 'int' object. I can confirm the data types of the dataframe match the schema...
Read more >
("Expected bytes, got a 'int' object", 'Conversion failed for ...
Was playing around with some charting with the Altair library. Everything was going well until I hit this error when using some of...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found