question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Large file timeout error in azure-storage-file-datalake

See original GitHub issue
  • azure-storage-file-datalake:
  • 12.9.1:
  • Debian Bullseye:
  • 3.10.5:

Describe the bug No matter what I’ve tried I’ve gotten failed uploads for an unknown timeout when sending large files to the data lake.

To Reproduce Steps to reproduce the behavior: I uploaded a 109 MB zip file using the upload_data method on an instance of the DataLakeFileClient class with a very generous timeout argument. instance.upload_data(local_file, overwrite=True, timeout=3000) The upload is successful for small files but fails on ones this large.

Expected behavior Not having the code fail with a timeout error

Screenshots This is the output from the traceback.

    return upload_datalake_file(**options)
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_upload_helper.py", line 76, in upload_datalake_file
    upload_data_chunks(
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/uploads.py", line 79, in upload_data_chunks
    range_ids = [uploader.process_chunk(result) for result in uploader.get_chunk_streams()]
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/uploads.py", line 79, in <listcomp>
    range_ids = [uploader.process_chunk(result) for result in uploader.get_chunk_streams()]
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/uploads.py", line 195, in process_chunk
    return self._upload_chunk_with_progress(chunk_offset, chunk_bytes)
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/uploads.py", line 211, in _upload_chunk_with_progress
    range_id = self._upload_chunk(chunk_offset, chunk_data)
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/uploads.py", line 354, in _upload_chunk
    self.response_headers = self.service.append_data(
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer
    return func(*args, **kwargs)
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_generated/operations/_path_operations.py", line 2500, in append_data
    pipeline_response = self._client._pipeline.run(  # type: ignore # pylint: disable=protected-access
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 211, in run
    return first_node.send(pipeline_request)  # type: ignore
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  [Previous line repeated 2 more times]
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/policies/_redirect.py", line 158, in send
    response = self.next.send(request)
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/policies.py", line 532, in send
    raise err
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/policies.py", line 506, in send
    response = self.next.send(request)
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  [Previous line repeated 1 more time]
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/storage/filedatalake/_shared/policies.py", line 304, in send
    response = self.next.send(request)
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/_base.py", line 103, in send
    self._sender.send(request.http_request, **request.context.options),
  File "/opt/miniconda3/envs/vs-access/lib/python3.10/site-packages/azure/core/pipeline/transport/_requests_basic.py", line 361, in send
    raise error
azure.core.exceptions.ServiceResponseError: ('Connection aborted.', TimeoutError('The write operation timed out'))

Additional context I’ve seen hidden settings on other apis like this one. https://github.com/Azure/azure-sdk-for-java/issues/30406 Is there a similar fix for the azure-storage-file-datalake api?

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
coreyryanhansoncommented, Oct 25, 2022

Setting the other timeout value did the trick. Thank you!!!

1reaction
jalauzon-msftcommented, Oct 25, 2022

@coreyryanhanson Corey, to provide some additional information. An upload through the upload_data API will split the upload into 100 MiB chunks by default. Then there is a client-side timeout, connection_timeout, that defaults to 20 seconds. If you are unable to upload the 100 MiB in 20 seconds, you will get the client-side timeout error as you have. The timeout keyword you set is the server-side timeout for the operation. We realize the documentation around these different timeouts could be better and we will work on that in the future.

So, there are two options to avoid the timeout here (or use a combination of both):

  • Set the connection_timeout higher as Vincent has suggested. You will want to set this to how long you think it will take your system to upload 100 MiB of data.
  • Change the chunk size that is used for the upload. This can be done with the chunk_size keyword arg on upload_data. This chunk size can be whatever you want/need for your environment, just be aware the SDK will make a separate network call for each chunk.
Read more comments on GitHub >

github_iconTop Results From Across the Web

[BUG] DataLakeFileClient.uploadFromFile() times out at 60 ...
File should upload to blob storage in the data lake container. Smaller files make it no problem, but files above 500 MB throw...
Read more >
Known issues with Azure Data Lake Storage Gen2
If you write to a file by using Data Lake Storage Gen2 APIs or NFS 3.0, then that file's blocks won't be visible...
Read more >
Solve timeout errors on file uploads with new azure.storage ...
upload() fails on larger files with a timeout error that completely ignores the timeout parameter of the function. I get a ServiceResponseError ......
Read more >
azure.storage.filedatalake package - NET
timeout (int) – The timeout parameter is expressed in seconds. Return type. FileSystemClient. Example: Creating a file system in the datalake service.¶.
Read more >
azure-storage-file-datalake - PyPI
Microsoft Azure File DataLake Storage Client Library for Python. ... from azure.storage.filedatalake import DataLakeServiceClient service ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found