[BigQuery] Streaming insert drops records?
See original GitHub issueI’m facing an issue with BigQuery streaming inserts (Table.insert(...)
, specifically insert(Iterable<InsertAllRequest.RowToInsert> rows, boolean skipInvalidRows, boolean ignoreUnknownValues)
with skipInvalidRows = false
and ignoreUnknownValues = false
) where (sometimes) records don’t seem to be available after one or more insert requests. The InsertAllRequest
s complete successfully, i.e. no exceptions are thrown and no errors are reported (InsertAllResponse.hasErrors
returns false
). I checked availability of streamed data in the BigQuery Web UI and using the Table.list(...)
API. According to https://cloud.google.com/bigquery/streaming-data-into-bigquery I would expect streamed data to be available for query a few seconds after insertion. In cases where some records were missing after the initial check, I tried again after 10s, 30s, 60s, 1h, … but to no avail. So it looks like the records have been dropped for some reason.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:4
- Comments:10
@andreamlin Thank you for your reply.
InsertAllReponse
does not expose a status code and as such I would expect any non-200 status to be turned into an appropriateBigQueryException
when callingtable.insert(Iterable<InsertAllRequest.RowToInsert> rows, boolean skipInvalidRows, boolean ignoreUnknownValues)
.Anyone experiencing this issue now? I am using the 1.88.0 version for some testing and experienced a few times of record missing, every 10K API call may occur 1-2 times of record missing, the API did not return any error. PS:it happens when I insert the data in a batch way (100 rows per request), my recent 11 times 10000 requests (1 row per request) works fine, my testing data are the same one, and I am not using rowid. So maybe it is not the API side, but the server buffer related issue?