question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Better S3 download_fileobj docs, note potential need to call flush() with threaded transfers

See original GitHub issue

Threaded transfers using the S3 download_fileobj will leave the file position in a nondeterministic state.

The example from the function’s docstring is:

import boto3
s3 = boto3.client('s3')

with open('filename', 'wb') as data:
    s3.download_fileobj('mybucket', 'mykey', data)

Inside the with clause a data.tell() call will behave differently if threaded transfers were used. This is made worse by the threshold put in place to guard against threaded transfers. For small files, the file position will always appear to be deterministic.

If the same approach is used with an open context (e.g. a named temporary file), the download could appear to be incomplete:

import boto3
import tempfile
s3 = boto3.client('s3')

with tempfile.NamedTemporaryFile(mode='wb') as data:
    s3.download_fileobj('mybucket', 'mykey', data)
    # do something with data before it's closed and removed

Noting this behavior and recommending the file is flushed prior to use would help catch downloads that appear to be incomplete.

def download_fileobj(self, Bucket, Key, Fileobj, ExtraArgs=None,
                     Callback=None, Config=None):
    """Download an object from S3 to a file-like object.

    The file-like object must be in binary mode.

    This is a managed transfer which will perform a multipart download in
    multiple threads if necessary. This behavior may leave the file position
    in an unexpected state. A call to `flush` may be required.

    Usage::

        import boto3

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:9
  • Comments:8 (1 by maintainers)

github_iconTop GitHub Comments

3reactions
nbargnesicommented, Mar 14, 2022

We encourage you to check if this is still an issue in the latest release.

It is.

for _ in range(2):
    with open('filename', 'wb') as data:
        s3.download_fileobj(bucket, key, data)
        print(data.tell())
33554432
25165824

Using the latest boto3 version:

print(boto3.__version__)
1.21.18
1reaction
github-actions[bot]commented, Mar 13, 2022

Greetings! It looks like this issue hasn’t been active in longer than one year. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.

Read more comments on GitHub >

github_iconTop Results From Across the Web

S3 — Boto 3 Docs 1.9.46 documentation - AWS
a writeable file-like object. Tracking progress of individual transfers; Managing retries of transfers; Configuring various transfer settings such as: Max ...
Read more >
Track download progress of S3 file using boto3 and callbacks
I am trying to download a text file from S3 using boto3. Here is what I have written.
Read more >
ibm_boto3.s3.transfer — ibm-cos-sdk 2.12.1 documentation
Source code for ibm_boto3.s3.transfer ... _lock = threading.Lock() ... _size, percentage)) sys.stdout.flush() transfer = S3Transfer(ibm_boto3.client('s3', ...
Read more >
Amazon S3 examples using SDK for Python (Boto3)
Scenarios are code examples that show you how to accomplish a specific task by calling multiple functions within the same service. Get started....
Read more >
AWS S3 MultiPart Upload with Python and Boto3
And I'll explain everything you need to do to have your environment ... from boto3.s3.transfer import TransferConfig ... sys.stdout.flush().
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found