question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Option to provide partition column and partition expiry time

See original GitHub issue

While creating a new table using pandas, it would be nice if it can partition the table and set an partition expiry time. The python bigquery library already supports it

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_ref = client.dataset('my_dataset')

table_ref = dataset_ref.table("my_partitioned_table")
schema = [
    bigquery.SchemaField("name", "STRING"),
    bigquery.SchemaField("post_abbr", "STRING"),
    bigquery.SchemaField("date", "DATE"),
]
table = bigquery.Table(table_ref, schema=schema)
table.time_partitioning = bigquery.TimePartitioning(
    type_=bigquery.TimePartitioningType.DAY,
    field="date",  # name of column to use for partitioning
    expiration_ms=7776000000,
)  # 90 days

table = client.create_table(table)

print(
    "Created table {}, partitioned on column {}".format(
        table.table_id, table.time_partitioning.field
    )
)

https://cloud.google.com/bigquery/docs/creating-column-partitions

I can create a pull request, if people feel like it’s something they find useful. At least in my work, we create lot of monitoring tables on bigquery using pandas, and push data to it. These tables keep growing and since we can’t set the partition when a table has already been created, these tables just become too big, and expensive.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:11
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

5reactions
tswastcommented, Feb 14, 2020

I’m open to a pull request that adds this.

Ideally, I’d like to see to_gbq gain a configuration parameter that takes a load job JSON configuration. (job configuration resource, load job configuration resource)

Example:

pandas_gbq.to_gbq(
  df,
  configuration={
    "load": {
      "timePartitioning": {
         "type": "DAY",
         "expirationMs": str(1000*60*60*24*30),  # 30 days
         "field": "my_timestamp_col"
      }
    }
  }
)

One problem with the configuration proposal is that currently pandas-gbq creates tables with calls to create_table if the table doesn’t exist, rather than letting the load job create it. I’d like to refactor to_gbq to avoid this (unnecessary, IMO) step. Open to PRs to do that refactoring, but if you’d like to extract options from configuration when creating the table, that might be simpler short-term.

1reaction
tswastcommented, Feb 18, 2020

Why does the expirationMs need to be passed as a string though?

It’s a historical artifact of the BigQuery REST endpoint using an older JSON parsing implementation that only had JavaScript Number (floating point) available. Encoding it as a string allows the BigQuery REST endpoint to interpret the value as a 64-bit integer without loss of precision.

I believe an integer will be accepted, but you might lose precision for large values.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Managing partitioned tables | BigQuery - Google Cloud
A partition's expiration time is calculated from the partition boundary in UTC. For example, with daily partitioning, the partition boundary is at midnight...
Read more >
Option to provide partition column and partition expiry time #313
While creating a new table using pandas, it would be nice if it can partition the table and set an partition expiry time....
Read more >
Google Bigquery: How to update partition expiration time for a ...
# update expiry to 7 days = 7 * 24 * 60 * 60 = 604800 s bq update --time_partitioning_expiration 604800 [PROJECT-ID]:[DATASET].
Read more >
Guide to BigQuery Partition - Coupler.io Blog
BigQuery partition by time-unit column ... we're creating a “customers” table with 4 columns, and we have all three options to partition:.
Read more >
Creating Date-Partitioned Tables in BigQuery
Note: Partitions within partitioned tables on your lab account will auto-expire after 60 days from the value in your date column. Your personal...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found