Retry not working correctly for large files
See original GitHub issueWhich service(blob, file, queue) does this issue concern?
Blob
What problem was encountered?
Upload of a large file had a failing blob block upload. Azure storage says it correctly re-uploaded, but according to Storage analytics that did not happen:
<Agent>=Azure-Storage/1.1.0-1.1.0 (Python CPython 2.7.9; Linux 3.16.0-5-amd64)
MAX_BLOCK_SIZE=100000000
= 100 MBmax_connections=8
timeout=30
- Total Blob size:
4294903296
= 4 GiB - Only
4249585152
were uploaded (checked withaz storage blob show
)
Azure Storage of the particular block
2018-04-18 16:33:04,807 <az> INFO Client-Request-ID=2b3d9568-4326-11e8-8ef0-000d3a29b453 Outgoing request: Method=PUT, Path=/<container>/<blob>?<SAS>, Query={'comp': 'block', 'blockid': u'QmxvY2tJZDAwMDAx', 'timeout': '30'}, Headers={'Content-Length': '100000000', 'x-ms-client-request-id': '2b3d9568-4326-11e8-8ef0-000d3a29b453', 'User-Agent': '<Agent>', 'x-ms-version': '2017-07-29', 'x-ms-lease-id': None, 'x-ms-date': 'Wed, 18 Apr 2018 16:33:04 GMT'}.
2018-04-18 16:33:38,838 <az> INFO Client-Request-ID=2b3d9568-4326-11e8-8ef0-000d3a29b453 Operation failed: checking if the operation should be retried. Current retry count=0, , HTTP status code=Unknown, Exception=SSLError: HTTPSConnectionPool(host='<account>.blob.core.windows.net', port=443): Max retries exceeded with url: /<container>/<blob>?<SAS>&comp=block&blockid=QmxvY2tJZDAwMDAx&timeout=30 (Caused by SSLError(SSLError('The write operation timed out',),)).
2018-04-18 16:33:53,265 <az> INFO Client-Request-ID=2b3d9568-4326-11e8-8ef0-000d3a29b453 Outgoing request: Method=PUT, Path=/<container>/<blob>?<SAS>, Query={'comp': 'block', 'blockid': u'QmxvY2tJZDAwMDAx', 'timeout': '30'}, Headers={'Content-Length': '100000000', 'x-ms-client-request-id': '2b3d9568-4326-11e8-8ef0-000d3a29b453', 'User-Agent': '<Agent>', 'x-ms-version': '2017-07-29', 'x-ms-lease-id': None, 'x-ms-date': 'Wed, 18 Apr 2018 16:33:53 GMT'}.
2018-04-18 16:33:59,569 <az> INFO Client-Request-ID=2b3d9568-4326-11e8-8ef0-000d3a29b453 Receiving Response: Server-Timestamp=Wed, 18 Apr 2018 16:33:58 GMT, Server-Request-ID=2c9b33ae-701e-0109-6133-d7286a000000, HTTP Status Code=201, Message=Created, Headers={'Content-Length': '100000000', 'x-ms-client-request-id': '2b3d9568-4326-11e8-8ef0-000d3a29b453', 'User-Agent': '<Agent>', 'x-ms-version': '2017-07-29', 'x-ms-lease-id': None, 'x-ms-date': 'Wed, 18 Apr 2018 16:33:53 GMT'}.
Storage Analytics of the particular block:
1.0;2018-04-18T16:33:53.3070387Z;PutBlock;SASSuccess;201;6260;5669;sas;;<account>;blob;"https://<account>.blob.core.windows.net:443/<container>/<blob>?<SAS>&comp=block&blockid=QmxvY2tJZDAwMDAx&timeout=30";"/<account>/<container>/<blob>";2c9b33ae-701e-0109-6133-d7286a000000;0;<internal IP address>:53156;2017-07-29;567;54681856;193;0;54681856;;"zNZ1j2PNDLDEV5szpt0DKg==";;;;"<Agent>";;"2b3d9568-4326-11e8-8ef0-000d3a29b453"
- Azure Storage sees
'Content-Length': '100000000'
, while Storage Analytics says it only received54681856
bytes.100000000 - 54681856 = 45318144
4294903296 - 4249585152 = 45318144
- Storage Analytics only registers the black after the retry started.
- Storage Analytics did show no data for the allegedly successful upload.
I can also provide you with the other logs of the successfully uploaded blocks, etc., but they contain little information: Content-length is consistent, return-codes are ok (201), timestamps are reasonable, blocklist/head upload happens in the correct order, etc.
I can not reliably reproduce the error, but I see it occasionally when uploading several TiB from on-premise as well as Azure-VMs.
Have you found a mitigation/solution?
Not yet, but complete re-upload of the block after the blocklist has been committed seems sensible.
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (5 by maintainers)
Top Results From Across the Web
kafka retry many times when i download large file
My team found my problem when I redeploy k8s pods, which lead to conflict leader partition causing rebalance. It will try to process...
Read more >Resolve issues with uploading large files in Amazon S3
I'm trying to upload a large file (1 GB or larger) to Amazon Simple Storage Service (Amazon S3) using the console. However, the...
Read more >Error retries and exponential backoff in AWS
Configure retry settings in the client application when errors occur and use an exponential backoff algorithm for better flow control.
Read more >Error Request timed out when you try to upload a large file ...
You try to upload a large file to a document library. ... To work around this problem, edit the <configuration> section in the...
Read more >Fix file syncing issues in the Creative Cloud desktop app
Fix file syncing issues in the Creative Cloud desktop app ... Unable to sync files because your system time is not correctly set....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @Fra-nk, I’m glad to hear that!
The release is going out in 1-2 weeks, as we have a few more changes to merge in. Thank you!
Hi @Fra-nk, calling
create_blob_from_path
is perfectly fine, I was just making sure that the issue occurred when a seekable stream was used as body. Basically the bug was that we weren’t rewinding the body (seekable stream) properly in the case of a retry. And with larger files, it is more likely that we have retries occurring and thus encountering this bug. It should now work properly after the fix on dev branch.Please let me know if you encounter any other problem or have any question. I’ll keep this issue open until the fix gets released. Thank you!