Large file timeout error in azure-storage-file-datalake
See original GitHub issue- azure-storage-file-datalake:
- 12.9.1:
- Debian Bullseye:
- 3.10.5:
Describe the bug No matter what I’ve tried I’ve gotten failed uploads for an unknown timeout when sending large files to the data lake.
To Reproduce Steps to reproduce the behavior: I uploaded a 109 MB zip file using the upload_data method on an instance of the DataLakeFileClient class with a very generous timeout argument. instance.upload_data(local_file, overwrite=True, timeout=3000) The upload is successful for small files but fails on ones this large.
Expected behavior Not having the code fail with a timeout error
Screenshots This is the output from the traceback.
return upload_datalake_file(**options)
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_upload_helper.py", line 76, in upload_datalake_file
upload_data_chunks(
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/uploads.py", line 79, in upload_data_chunks
range_ids = [uploader.process_chunk(result) for result in uploader.get_chunk_streams()]
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/uploads.py", line 79, in <listcomp>
range_ids = [uploader.process_chunk(result) for result in uploader.get_chunk_streams()]
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/uploads.py", line 195, in process_chunk
return self._upload_chunk_with_progress(chunk_offset, chunk_bytes)
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/uploads.py", line 211, in _upload_chunk_with_progress
range_id = self._upload_chunk(chunk_offset, chunk_data)
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/uploads.py", line 354, in _upload_chunk
self.response_headers = self.service.append_data(
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer
return func(*args, **kwargs)
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_generated/operations/_path_operations.py", line 2500, in append_data
pipeline_response = self._client._pipeline.run( # type: ignore # pylint: disable=protected-access
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 211, in run
return first_node.send(pipeline_request) # type: ignore
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
response = self.next.send(request)
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
response = self.next.send(request)
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
response = self.next.send(request)
[Previous line repeated 2 more times]
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/policies/_redirect.py", line 158, in send
response = self.next.send(request)
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
response = self.next.send(request)
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/policies.py", line 532, in send
raise err
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/policies.py", line 506, in send
response = self.next.send(request)
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
response = self.next.send(request)
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
response = self.next.send(request)
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
response = self.next.send(request)
[Previous line repeated 1 more time]
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/policies.py", line 304, in send
response = self.next.send(request)
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
response = self.next.send(request)
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
response = self.next.send(request)
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 103, in send
self._sender.send(request.http_request, **request.context.options),
File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/transport/_requests_basic.py", line 361, in send
raise error
azure.core.exceptions.ServiceResponseError: ('Connection aborted.', TimeoutError('The write operation timed out'))
Additional context I’ve seen hidden settings on other apis like this one. https://github.com/Azure/azure-sdk-for-java/issues/30406 Is there a similar fix for the azure-storage-file-datalake api?
Issue Analytics
- State:
- Created a year ago
- Comments:6 (4 by maintainers)
Top GitHub Comments
Setting the other timeout value did the trick. Thank you!!!
@coreyryanhanson Corey, to provide some additional information. An upload through the
upload_data
API will split the upload into 100 MiB chunks by default. Then there is a client-side timeout,connection_timeout
, that defaults to 20 seconds. If you are unable to upload the 100 MiB in 20 seconds, you will get the client-side timeout error as you have. Thetimeout
keyword you set is the server-side timeout for the operation. We realize the documentation around these different timeouts could be better and we will work on that in the future.So, there are two options to avoid the timeout here (or use a combination of both):
connection_timeout
higher as Vincent has suggested. You will want to set this to how long you think it will take your system to upload 100 MiB of data.chunk_size
keyword arg onupload_data
. This chunk size can be whatever you want/need for your environment, just be aware the SDK will make a separate network call for each chunk.