Add upsert/delete support via periodic MERGE statements
See original GitHub issueAlthough it’s not a common use case for BigQuery, it’s come up that it’d be nice to support upserts and deletes with this connector. I’d like to propose the following way to allow the connector to optionally support that functionality, without negatively impacting existing users of the connector in append-only mode:
- Derive the key fields for the upsert and/or delete from the key of the message in Kafka, which will be required to be either a map or a struct. This will allow us to use the field names from that key and map them to columns in the destination BigQuery table.
- Insert all records received by the connector into an intermediate table, possibly named something like
${table}_tmp
, where${table}
is the name of the table that the records are destined for. - Periodically perform a
MERGE
operation that uses the intermediate table as its source and includes logic for upsert (if the value of the record is non-null) and/or delete (if the value of the record is null) based on the columns present in the record key.
This should only take place if the user enables upserts or deletes; otherwise, the connector will continue streaming directly to the intended table (or GCS bucket) as it does now.
There’s definitely some details that need to be ironed out here, such as what will happen if the fields in the record key change over time, how this would interact with table partitioning, and how/where the MERGE
would be performed, but this is the overall idea.
@mtagle, @criccomini, what are your thoughts on this?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:3
- Comments:12 (6 by maintainers)
Top GitHub Comments
Alright, design doc is up and anyone can comment on it here: https://docs.google.com/document/d/1p8_rLQqR9GIALIruB3-MjqR8EgYdaEw2rlFF1fxRJf0/edit
Yep. I suggest a more detailed design document as a next step. That way, we can all get on the same page about implementation details.