Make Cloud Storage client retry on backend error
See original GitHub issueWe’re operating at scale on GCS and are regularly experiencing transient HTTP 410 status codes when accessing Cloud storage. Those 410 status codes returned by Cloud storage are bogus though, as they are effectively just hiding an internal backend error on GCS, which is reflected in the error details:
Caused by: com.google.api.client.googleapis.json.GoogleJsonResponseException:
410 Gone { "code" : 503, "errors" : [ { "domain" : "global", "message" : "Backend Error", "reason" : "backendError" } ], "message" : "Backend Error" }
The google-cloud-storage client does not treat the 410 status code as retryable, understandibly so. It should be retrying on backend errors, though, which are typically exposed with status code 500 or 503. I’m suggesting to treat backend errors in the client in the same way as it treats internal errors, namely match on reason == backendError
independently of HTTP status code.
Note that we’re not the first ones to experience this, and the client should be resilient against these transient GCS errors.
- https://stackoverflow.com/questions/33056415/uploading-files-into-google-cloud-storage-500-backend-error
- https://stackoverflow.com/questions/35125891/google-cloud-dataflow-jobs-failing-inaccessible-jars-410-gone-errors
- https://stackoverflow.com/questions/41215541/dataflow-jobs-fail-after-a-few-410-errors-while-writing-to-gcs
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (1 by maintainers)
Based on the discussion with storage backend the 410 happens during a JSON API resumable upload session. The error likely indicates that the upload session has already been terminated and retrying the individual HTTP request would not work (the entire upload session has to be restarted). An internal bug has been filed and storage team is actively working on it.
I’ve contacted the storage backend team and if they aren’t against it I’ll add the retry logic.