Duplicate records when mirroring data to BigQuery
See original GitHub issue- Extension name:
firestore-bigquery-export
- Extension version: v0.1.12
It was running perfectly fine on v0.1.5 but after upgrading to v0.1.12 the extensions started to export duplicated records to BigQuery.
The only difference between two duplicate records is the document_id
field: one of the records has a null document_id
and the other has a non-null document_id
.
It’s a very severe problem since it compromised many of my reports.
Weirdly, not all the records are getting duplicated. Sometimes they do and sometimes they do not - and whey they don’t, there’s only one record: the one with the non-null document_id
. Still, this is a very frequent issue and since then I already have thousands of duplicates.
It’s also worth mentioning that all of my extensions have been upgraded almost at the same time and that some of them are working perfectly.
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (4 by maintainers)
@dackers86 just as a follow up, yes I backfilled the entries to fix this discrepancy.
For anyone else running into this issue, I backfilled by the query below, but you should be very careful when doing this. This query assumes that the last 20 characters of the column
document_name
are the document id. There could be cases where that is not a safe assumption.So for me it looks something like:
Hope this is helpful!
Closing as this appears to be resolved.