Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Retry not working correctly for large files

See original GitHub issue

Which service(blob, file, queue) does this issue concern?

Blob

What problem was encountered?

Upload of a large file had a failing blob block upload. Azure storage says it correctly re-uploaded, but according to Storage analytics that did not happen:

<Agent>=Azure-Storage/1.1.0-1.1.0 (Python CPython 2.7.9; Linux 3.16.0-5-amd64)
MAX_BLOCK_SIZE=100000000 = 100 MB
max_connections=8
timeout=30
Total Blob size: 4294903296 = 4 GiB
Only 4249585152 were uploaded (checked with az storage blob show)

Azure Storage of the particular block

2018-04-18 16:33:04,807 <az> INFO  Client-Request-ID=2b3d9568-4326-11e8-8ef0-000d3a29b453 Outgoing request: Method=PUT, Path=/<container>/<blob>?<SAS>, Query={'comp': 'block', 'blockid': u'QmxvY2tJZDAwMDAx', 'timeout': '30'}, Headers={'Content-Length': '100000000', 'x-ms-client-request-id': '2b3d9568-4326-11e8-8ef0-000d3a29b453', 'User-Agent': '<Agent>', 'x-ms-version': '2017-07-29', 'x-ms-lease-id': None, 'x-ms-date': 'Wed, 18 Apr 2018 16:33:04 GMT'}.
2018-04-18 16:33:38,838 <az> INFO  Client-Request-ID=2b3d9568-4326-11e8-8ef0-000d3a29b453 Operation failed: checking if the operation should be retried. Current retry count=0, , HTTP status code=Unknown, Exception=SSLError: HTTPSConnectionPool(host='<account>.blob.core.windows.net', port=443): Max retries exceeded with url: /<container>/<blob>?<SAS>&comp=block&blockid=QmxvY2tJZDAwMDAx&timeout=30 (Caused by SSLError(SSLError('The write operation timed out',),)).
2018-04-18 16:33:53,265 <az> INFO  Client-Request-ID=2b3d9568-4326-11e8-8ef0-000d3a29b453 Outgoing request: Method=PUT, Path=/<container>/<blob>?<SAS>, Query={'comp': 'block', 'blockid': u'QmxvY2tJZDAwMDAx', 'timeout': '30'}, Headers={'Content-Length': '100000000', 'x-ms-client-request-id': '2b3d9568-4326-11e8-8ef0-000d3a29b453', 'User-Agent': '<Agent>', 'x-ms-version': '2017-07-29', 'x-ms-lease-id': None, 'x-ms-date': 'Wed, 18 Apr 2018 16:33:53 GMT'}.
2018-04-18 16:33:59,569 <az> INFO  Client-Request-ID=2b3d9568-4326-11e8-8ef0-000d3a29b453 Receiving Response: Server-Timestamp=Wed, 18 Apr 2018 16:33:58 GMT, Server-Request-ID=2c9b33ae-701e-0109-6133-d7286a000000, HTTP Status Code=201, Message=Created, Headers={'Content-Length': '100000000', 'x-ms-client-request-id': '2b3d9568-4326-11e8-8ef0-000d3a29b453', 'User-Agent': '<Agent>', 'x-ms-version': '2017-07-29', 'x-ms-lease-id': None, 'x-ms-date': 'Wed, 18 Apr 2018 16:33:53 GMT'}.

Storage Analytics of the particular block:

1.0;2018-04-18T16:33:53.3070387Z;PutBlock;SASSuccess;201;6260;5669;sas;;<account>;blob;"https://<account>.blob.core.windows.net:443/<container>/<blob>?<SAS>&amp;comp=block&amp;blockid=QmxvY2tJZDAwMDAx&amp;timeout=30";"/<account>/<container>/<blob>";2c9b33ae-701e-0109-6133-d7286a000000;0;<internal IP address>:53156;2017-07-29;567;54681856;193;0;54681856;;"zNZ1j2PNDLDEV5szpt0DKg==";;;;"<Agent>";;"2b3d9568-4326-11e8-8ef0-000d3a29b453"

Azure Storage sees 'Content-Length': '100000000', while Storage Analytics says it only received 54681856 bytes.
- 100000000 - 54681856 = 45318144
- 4294903296 - 4249585152 = 45318144
Storage Analytics only registers the black after the retry started.
Storage Analytics did show no data for the allegedly successful upload.

I can also provide you with the other logs of the successfully uploaded blocks, etc., but they contain little information: Content-length is consistent, return-codes are ok (201), timestamps are reasonable, blocklist/head upload happens in the correct order, etc.

I can not reliably reproduce the error, but I see it occasionally when uploading several TiB from on-premise as well as Azure-VMs.

Have you found a mitigation/solution?

Not yet, but complete re-upload of the block after the blocklist has been committed seems sensible.

Issue Analytics

State:
Created 5 years ago
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

zezha-msftcommented, Apr 27, 2018

Hi @Fra-nk, I’m glad to hear that!

The release is going out in 1-2 weeks, as we have a few more changes to merge in. Thank you!

1reaction

zezha-msftcommented, Apr 24, 2018

Hi @Fra-nk, calling create_blob_from_path is perfectly fine, I was just making sure that the issue occurred when a seekable stream was used as body. Basically the bug was that we weren’t rewinding the body (seekable stream) properly in the case of a retry. And with larger files, it is more likely that we have retries occurring and thus encountering this bug. It should now work properly after the fix on dev branch.

Please let me know if you encounter any other problem or have any question. I’ll keep this issue open until the fix gets released. Thank you!

Top Results From Across the Web

kafka retry many times when i download large file

My team found my problem when I redeploy k8s pods, which lead to conflict leader partition causing rebalance. It will try to process...

Resolve issues with uploading large files in Amazon S3

I'm trying to upload a large file (1 GB or larger) to Amazon Simple Storage Service (Amazon S3) using the console. However, the...

Error retries and exponential backoff in AWS

Configure retry settings in the client application when errors occur and use an exponential backoff algorithm for better flow control.

Error Request timed out when you try to upload a large file ...

You try to upload a large file to a document library. ... To work around this problem, edit the <configuration> section in the...

Fix file syncing issues in the Creative Cloud desktop app

Fix file syncing issues in the Creative Cloud desktop app ... Unable to sync files because your system time is not correctly set....

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Retry not working correctly for large files

Which service(blob, file, queue) does this issue concern?

What problem was encountered?

Have you found a mitigation/solution?

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Irrespective of an assigned role, a service principle has same privileges as an owner

ModuleNotFoundError: No module named 'azure.storage' not always reproducable