question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Dynamic overwrite of partitions does not work as expected

See original GitHub issue

In Spark when you set spark.conf.set(“spark.sql.sources.partitionOverwriteMode”,“dynamic”) and then do an insert into a partitioned table in overwrite mode. The newly inserted partitions would overwrite only partitions being inserted. Other partitions that were not part of that group would also stick around untouched. When writing to BigQuery with this connector the entire table in BigQuery gets wiped out and only the new partitions inserted show up. Can the connector be updated to support dynamic partition overwrites?

I am testing with gs://spark-lib/bigquery/spark-bigquery-latest.jar.

Thanks!

Example setup of this scenario:

Ran this on BigQuery directly:

CREATE OR REPLACE TABLE gcp-project.dev.wiki_page_views_spark_write ( wiki_project STRING, wiki_page STRING, wiki_page_views INT64, date DATE ) PARTITION BY date OPTIONS ( partition_expiration_days=999999 )

spark.conf.set(“spark.sql.sources.partitionOverwriteMode”,“dynamic”)

Saving the data to BigQuery

wiki.write.format(‘bigquery’)
.option(‘table’, ‘gcp-project.dev.wiki_page_views_spark_write’)
.option(‘project’,‘gcp-project’)
.option(‘temporaryGcsBucket’,‘gcp-project/tmp/bq_staging’)
.mode(‘overwrite’)
.save()

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:13 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
AmineSagaamacommented, Jun 13, 2020

As a workaround, I set the write mode to “Append”, so, BigQuery will add new partition to the table without deleting old partitions. If I need to delete a partition, in case of a reset, I can use bq rm 'dataset.table$20200614' to delete a specific partition in the Table.

0reactions
gitmstoutecommented, Oct 13, 2022

+1 for the request

Read more comments on GitHub >

github_iconTop Results From Across the Web

Dynamic overwrite of partitions does not work as expected #103
In Spark when you set spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic") and then do an insert into a partitioned table in overwrite mode.
Read more >
Spark insert overwrite into dynamic partition table not working ...
I trying to merge a lot of small files generated into a bigger one. My files are in parquet format. I am creating...
Read more >
Selectively overwrite data with Delta Lake | Databricks on AWS
Validate that the data written with dynamic partition overwrite touches only the expected partitions. A single row in the incorrect partition ...
Read more >
Hive - issue inserting records to partitioned table in hdp 2.6.3
Try to delete the data and drop partition,prior to running the INSERT OVERWRITE TABLE. OR don't delete the data/drop partition for the external ......
Read more >
Spark SaveMode.overwrite not working as expected. Can ...
It's not dynamic partitioning though, I create a dataframe and manually add the partition value and column at the end using withColumn. As ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found