Better S3 download_fileobj docs, note potential need to call flush() with threaded transfers
See original GitHub issueThreaded transfers using the S3 download_fileobj will leave the file position in a nondeterministic state.
The example from the function’s docstring is:
import boto3
s3 = boto3.client('s3')
with open('filename', 'wb') as data:
s3.download_fileobj('mybucket', 'mykey', data)
Inside the with clause a data.tell()
call will behave differently if threaded transfers were used. This is made worse by the threshold put in place to guard against threaded transfers. For small files, the file position will always appear to be deterministic.
If the same approach is used with an open context (e.g. a named temporary file), the download could appear to be incomplete:
import boto3
import tempfile
s3 = boto3.client('s3')
with tempfile.NamedTemporaryFile(mode='wb') as data:
s3.download_fileobj('mybucket', 'mykey', data)
# do something with data before it's closed and removed
Noting this behavior and recommending the file is flushed prior to use would help catch downloads that appear to be incomplete.
def download_fileobj(self, Bucket, Key, Fileobj, ExtraArgs=None,
Callback=None, Config=None):
"""Download an object from S3 to a file-like object.
The file-like object must be in binary mode.
This is a managed transfer which will perform a multipart download in
multiple threads if necessary. This behavior may leave the file position
in an unexpected state. A call to `flush` may be required.
Usage::
import boto3
Issue Analytics
- State:
- Created 6 years ago
- Reactions:9
- Comments:8 (1 by maintainers)
Top Results From Across the Web
S3 — Boto 3 Docs 1.9.46 documentation - AWS
a writeable file-like object. Tracking progress of individual transfers; Managing retries of transfers; Configuring various transfer settings such as: Max ...
Read more >Track download progress of S3 file using boto3 and callbacks
I am trying to download a text file from S3 using boto3. Here is what I have written.
Read more >ibm_boto3.s3.transfer — ibm-cos-sdk 2.12.1 documentation
Source code for ibm_boto3.s3.transfer ... _lock = threading.Lock() ... _size, percentage)) sys.stdout.flush() transfer = S3Transfer(ibm_boto3.client('s3', ...
Read more >Amazon S3 examples using SDK for Python (Boto3)
Scenarios are code examples that show you how to accomplish a specific task by calling multiple functions within the same service. Get started....
Read more >AWS S3 MultiPart Upload with Python and Boto3
And I'll explain everything you need to do to have your environment ... from boto3.s3.transfer import TransferConfig ... sys.stdout.flush().
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
It is.
Using the latest boto3 version:
Greetings! It looks like this issue hasn’t been active in longer than one year. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.