Retry after HttpError code 400
See original GitHub issueGoogle Cloud Storage occasionally throws an HTTP Error 400 (which is nominally a ‘bad request’. See the Google Cloud docs on HTTP response 400). But this happens on requests that have worked previously and work again after retrying. I’ve seen these spurious HTTP Error 400s when calling gcs.du
and when using dask
to read data from Google Cloud.
The error message from GCP is: Error 400 (Bad Request)! That's an error. Your client has issued a malformed or illegal request. That's all we know.
Monkey-patching gcsfs.utils.is_retriable
fixes the issue for me:
import gcsfs
# Override is_retriable. Google Cloud sometimes throws
# a HttpError code 400. gcsfs considers this to not be retriable.
# But it is retriable!
def is_retriable(exception):
"""Returns True if this exception is retriable."""
errs = list(range(500, 505)) + [
# Jack's addition. Google Cloud occasionally throws Bad Requests for no apparent reason.
400,
# Request Timeout
408,
# Too Many Requests
429,
]
errs += [str(e) for e in errs]
if isinstance(exception, gcsfs.utils.HttpError):
return exception.code in errs
return isinstance(exception, gcsfs.utils.RETRIABLE_EXCEPTIONS)
gcsfs.utils.is_retriable = is_retriable
In a perfect world, I guess the best solution would be to ask Google Cloud to not throw spurious HTTP Error 400s. But perhaps a pragmatic approach is to modify gcsfs
to retry after HTTP Error 400s 😃
Environment:
- Dask version: 2.28.0
- Python version: 3.8.5
- Operating System: Ubuntu 20.04 on a Google Cloud VM
- Install method: conda
Issue Analytics
- State:
- Created 3 years ago
- Comments:17 (8 by maintainers)
Top GitHub Comments
I think I fixed it! Was a tricky bug to find.
Thanks loads for the replies! I don’t have logs to hand right now but if I come across this problem again then I’ll be sure to follow-up here with more details (including logs and details of the project).