question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pandas should get the schema from bigquery if pushing to a table that already exists

See original GitHub issue

Right now, when pushing new data to an already existing table using to_gbq, with option if_exists=append, but no explicit table_schema, pandas generates a default table schema, where the mode of the column, which takes value either REQUIRED or NULLABLE, by default is always NULLABLE.

It would make sense for pandas to fetch schema, and apply those for case where if_exists=append instead of passing a NULLABLE mode.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
ShantanuKumarcommented, Apr 30, 2020

It actually works for TIMESTAMP also ! We just needed to cast the string timestamp to datetime

def test_to_gbq_does_not_override_type(gbq_table, gbq_connector):
    table_id = "test_to_gbq_does_not_override_type"
    table_schema = {
        "fields": [
            {

                "name": "event_ts",
                "type": "TIMESTAMP",
                "mode": "REQUIRED",
                "description": "event_ts",
            },
            {
                "name": "event_type",
                "type": "STRING",
                "mode": "NULLABLE",
                "description": "event_type",
            },
        ]
    }
    df = DataFrame({
        "event_ts": [pandas.to_datetime("2020-03-03 01:00:00"),
                     pandas.to_datetime("2020-03-03 01:00:00")],
        "event_type": ["buy", "sell"]
    })

    gbq_table.create(table_id, table_schema)
    gbq.to_gbq(
        df,
        "{0}.{1}".format(gbq_table.dataset_id, table_id),
        project_id=gbq_connector.project_id,
        if_exists="append",
    )

    actual = gbq_table.schema(gbq_table.dataset_id, table_id)
    assert table_schema["fields"] == actual
0reactions
tswastcommented, Apr 29, 2020

I made a test corresponding to the example you provided in https://github.com/pydata/pandas-gbq/issues/315#issuecomment-597145800 but it still fails due to different types. It is expected that the pandas.Timestamp dtype is used for uploading to TIMESTAMP columns. Please open a separate feature request for that issue if different types is a problem for you.

Let’s use this issue to track the problem different modes (required vs. nullable).

def test_to_gbq_does_not_override_type(gbq_table, gbq_connector):
    table_id = "test_to_gbq_does_not_override_type"
    table_schema = {
        "fields": [
            {
                "name": "event_ts",
                "type": "TIMESTAMP",
            },
            {
                "name": "event_type",
                "type": "STRING",
            },
        ]
    }
    df = DataFrame({
        "event_ts": ["2020-03-03 01:00:00", "2020-03-03 02:00:00"],
        "event_type": ["buy", "sell"]
    })

    gbq_table.create(table_id, table_schema)
    gbq.to_gbq(
        df,
        "{0}.{1}".format(gbq_table.dataset_id, table_id),
        project_id=gbq_connector.project_id,
        if_exists="append",
    )

    actual = gbq_table.schema(table_id)
    assert table_schema["fields"] == actual
Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas should get the schema from bigquery if pushing to a ...
Right now, when pushing new data to an already existing table using to_gbq, with option if_exists=append, but no explicit table_schema, ...
Read more >
google bigquery - pandas to gbq claims a schema mismatch ...
Rename your dataframe accordingly: data.columns = [i.name for i in table.schema]. Pass the same schema while pushing it to BigQuery:
Read more >
pandas.DataFrame.to_gbq — pandas 1.5.2 documentation
Write a DataFrame to a Google BigQuery table. ... If schema is not provided, it will be generated according to dtypes of DataFrame...
Read more >
Load data from DataFrame | BigQuery - Google Cloud
Load contents of a pandas DataFrame to a table. ... schema=[ # Specify the type of columns whose type cannot be auto-detected. For...
Read more >
Writing Tables - pandas-gbq - Read the Docs
If the if_exists argument is set to 'append' , the destination dataframe will be written to the table using the defined table schema...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found