question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BigQuery] Streaming insert drops records?

See original GitHub issue

I’m facing an issue with BigQuery streaming inserts (Table.insert(...), specifically insert(Iterable<InsertAllRequest.RowToInsert> rows, boolean skipInvalidRows, boolean ignoreUnknownValues) with skipInvalidRows = false and ignoreUnknownValues = false) where (sometimes) records don’t seem to be available after one or more insert requests. The InsertAllRequests complete successfully, i.e. no exceptions are thrown and no errors are reported (InsertAllResponse.hasErrors returns false). I checked availability of streamed data in the BigQuery Web UI and using the Table.list(...) API. According to https://cloud.google.com/bigquery/streaming-data-into-bigquery I would expect streamed data to be available for query a few seconds after insertion. In cases where some records were missing after the initial check, I tried again after 10s, 30s, 60s, 1h, … but to no avail. So it looks like the records have been dropped for some reason.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:4
  • Comments:10

github_iconTop GitHub Comments

1reaction
martinstudercommented, Jun 7, 2018

@andreamlin Thank you for your reply. InsertAllReponse does not expose a status code and as such I would expect any non-200 status to be turned into an appropriate BigQueryException when calling table.insert(Iterable<InsertAllRequest.RowToInsert> rows, boolean skipInvalidRows, boolean ignoreUnknownValues).

0reactions
zmm021commented, Aug 26, 2019

Anyone experiencing this issue now? I am using the 1.88.0 version for some testing and experienced a few times of record missing, every 10K API call may occur 1-2 times of record missing, the API did not return any error. PS:it happens when I insert the data in a batch way (100 rows per request), my recent 11 times 10000 requests (1 row per request) works fine, my testing data are the same one, and I am not using rowid. So maybe it is not the API side, but the server buffer related issue?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Quotas and limits | BigQuery - Google Cloud
The following limits apply to jobs that export data from BigQuery by using the bq command-line tool, Google Cloud console, or the export-type...
Read more >
Google BigQuery: Data drop when streaming insert right after ...
The problem is, there are random data drops when doing streaming inserts after the copy operation. I found this link: After recreating BigQuery...
Read more >
Streaming Data Into BigQuery - huihoo
The first time a streaming insert occurs, the streamed data is inaccessible for ... The unreconciled data table might include duplicates or dropped...
Read more >
Google BigQuery Bulk Load (Streaming) - Confluence
Extract: The Google BigQuery Execute Snap extracts the records inserted into the destination table by the Google BigQuery Bulk Load (Streaming) Snap. The ......
Read more >
Chapter 4. Loading Data into BigQuery - O'Reilly
These tools can do change data capture (CDC) to allow you to stream changes from a database to a BigQuery table. External query...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found