Partial download reported as success in SDK API get_blob_to_path()
See original GitHub issueWhich service(blob, file, queue) does this issue concern?
We are downloading the Azure blob using Azure python SDK with API get_blob_to_path(). API reports success, however data downloaded is partial.
What problem was encountered?
Below is the code snippet Note: This is the custom code snipet. The main code is the call to the SDK API get_blob_to_path() in below sample.
def download_file(self, f_context):
src_file_path = f_context.src_path
dest_file_path = f_context.dest_path
az_uri = AzureUri(src_file_path)
kek = self._blob_service.key_encryption_key
require_encryption = self._blob_service.require_encryption
self._blob_service.key_encryption_key = None
self._blob_service.require_encryption = False
try:
blob = self._blob_service.get_blob_to_path(container_name=az_uri.container(),
blob_name=az_uri.blob_path(),
file_path=dest_file_path,
timeout=self.op_timeout)
props = blob.properties
self.logger.debug('download_file content_length:%s. content_range:%s', props.content_length, props.content_range)
self.logger.debug('download_file file size:%d', os.path.getsize(dest_file_path))
except Exception as err:
self.logger.error('download_file failure. Error:%s \\n StackTrace:{%s}', str(err), traceback.format_exc())
raise
finally:
self._blob_service.key_encryption_key = kek
self._blob_service.require_encryption = require_encryption
Below are logs from above code path
10294: 2018-01-25 11:04:15,503 - DEBUG - lib.ms_azure.azure_file_handler:download_file - download_file content_length:171490. content_range:bytes 0-171489/171490
10294: 2018-01-25 11:04:15,504 - DEBUG - lib.ms_azure.azure_file_handler:download_file - download_file file size:48588
Below are analytic logs with removed account specific details:
1.0;2018-01-25T05:33:36.3654235Z;GetBlob;NetworkError;206;38480;20;authenticated;xxxxxx....
Question: Actual blob content length is :171490 As indicated by the analytic logs, request returned 206 (partial content) However python SDK API get_blob_to_path() reported success
Only partial data is getting downloaded while SDK returns success. This results in data corruption at our side.
Expectations: When using SDK API get_blob_to_path(), to request complete blob:
SDK API returns complete blob i.e. all the data of the blob is returned. SDK API fails i.e. it throws exception regardless of what led to underlying behaviour (network error or any other transient issue).
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (5 by maintainers)
Top GitHub Comments
During a download, if a network error happens, underlying libcurl library will throw and be propagated to the Python SDK to the client.
In your test however, it seems the TCP stack is succeeding since you are manipulating the body with fiddler successfully hence there is no error thrown and propagated to the Client. This is the issue here. The test does not really simulate a real life scenario failure.
Not able to repro. Closing the issue.