Option to provide partition column and partition expiry time
See original GitHub issueWhile creating a new table using pandas, it would be nice if it can partition the table and set an partition expiry time. The python bigquery library already supports it
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_ref = client.dataset('my_dataset')
table_ref = dataset_ref.table("my_partitioned_table")
schema = [
bigquery.SchemaField("name", "STRING"),
bigquery.SchemaField("post_abbr", "STRING"),
bigquery.SchemaField("date", "DATE"),
]
table = bigquery.Table(table_ref, schema=schema)
table.time_partitioning = bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field="date", # name of column to use for partitioning
expiration_ms=7776000000,
) # 90 days
table = client.create_table(table)
print(
"Created table {}, partitioned on column {}".format(
table.table_id, table.time_partitioning.field
)
)
https://cloud.google.com/bigquery/docs/creating-column-partitions
I can create a pull request, if people feel like it’s something they find useful. At least in my work, we create lot of monitoring tables on bigquery using pandas, and push data to it. These tables keep growing and since we can’t set the partition when a table has already been created, these tables just become too big, and expensive.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:11
- Comments:5 (1 by maintainers)
Top Results From Across the Web
Managing partitioned tables | BigQuery - Google Cloud
A partition's expiration time is calculated from the partition boundary in UTC. For example, with daily partitioning, the partition boundary is at midnight...
Read more >Option to provide partition column and partition expiry time #313
While creating a new table using pandas, it would be nice if it can partition the table and set an partition expiry time....
Read more >Google Bigquery: How to update partition expiration time for a ...
# update expiry to 7 days = 7 * 24 * 60 * 60 = 604800 s bq update --time_partitioning_expiration 604800 [PROJECT-ID]:[DATASET].
Read more >Guide to BigQuery Partition - Coupler.io Blog
BigQuery partition by time-unit column ... we're creating a “customers” table with 4 columns, and we have all three options to partition:.
Read more >Creating Date-Partitioned Tables in BigQuery
Note: Partitions within partitioned tables on your lab account will auto-expire after 60 days from the value in your date column. Your personal...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I’m open to a pull request that adds this.
Ideally, I’d like to see
to_gbqgain aconfigurationparameter that takes a load job JSON configuration. (job configuration resource, load job configuration resource)Example:
One problem with the
configurationproposal is that currently pandas-gbq creates tables with calls tocreate_tableif the table doesn’t exist, rather than letting the load job create it. I’d like to refactorto_gbqto avoid this (unnecessary, IMO) step. Open to PRs to do that refactoring, but if you’d like to extract options fromconfigurationwhen creating the table, that might be simpler short-term.It’s a historical artifact of the BigQuery REST endpoint using an older JSON parsing implementation that only had JavaScript Number (floating point) available. Encoding it as a string allows the BigQuery REST endpoint to interpret the value as a 64-bit integer without loss of precision.
I believe an integer will be accepted, but you might lose precision for large values.