question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Streaming Uploads?

See original GitHub issue

Hey,

Sorry for treating this as a mailing list, I didn’t see any other method for contact, so I went ahead and opened an issue.

I’m trying to use boto3 to upload files uploaded to PyPI to S3. The majority of these files will be < 60MB but a handful of them will be larger (up to a few hundred MB in size). I’m trying to figure out what the right interface to use to do this is. Right now, in PyPI we have a streaming upload from the client along with the expected MD5 hash of the entire file once it’s been uploaded. I’m wondering if I can do something like:


import hashlib

class HashingFileWrapper:

    def __init__(self, wrapped, md5_hash):
        self.wrapped = wrapped
        self.md5_hash = md5_hash
        self.hash_ctx = hashlib.md5()

    def read(self, *args, **kwargs):
        chunk = self.read(*args, **kwargs)
        self.hash_ctx.update(chunk)
        if not chunk:
            if self.hash_ctx.hexdigest() != self.md5_hash:
                raise ValueError("Hash Does Not Match")


my_s3_object.put(
    Body=HashingFileWrapper(file_like_object, md5_hash),
    ContentLength=file_size,
    ContentMD5=md5_hash,
)

Will that stream it up to S3 without buffering the whole file in memory? If not, is my only option to buffer the data to a temporary file and then use the my_s3_object.upload_file() interface?

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:13 (8 by maintainers)

github_iconTop GitHub Comments

10reactions
rayluocommented, Sep 11, 2015

Yes it does.

Although not (yet?) mentioned in Botocore’s S3.Client.put_object() 's document, the Botocore S3.Client.put_object() does accept a file-like object. There is even a test case to ensure that. You won’t find the streaming implementation in the code base here, because it is actually supported by the underlying library, requests.

Both Boto 3’s Object.put() and Bucket.put_object() are calling Botocore’s put_object(), so they support streaming as well. It is mentioned here.

The higher level S3Transfer in Boto3 provides more handy features. Its upload_file() accepts a filename, and it will automatically split the big file into multiple chunks with default size as 8MB and default concurrency of 10, and each chunk is streaming through the aforementioned low level APIs.

3reactions
amarpatil5060commented, Apr 12, 2016

Hi rayluo , Can we send actual data buffer as parameters instead of filename to upload_file () in boto3 ?

With put_object(), I am suffering with high memory footprint. Is there something cleanup call which I am missing after put_object() call ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Streaming uploads | Cloud Storage - Google Cloud
Cloud Storage supports streaming data to a bucket without requiring that the data first be saved to a file. This is useful when...
Read more >
Streamable: Upload Video Online - Free Video Hosting
Upload your video in seconds on Streamable. We accept a variety of video formats including MP4, MOV, AVI, and more. It's free, try...
Read more >
Watch Upload - Season 1 | Prime Video - Amazon.com
From the Emmy-winning Greg Daniels (The Office, Parks & Rec) comes a hilarious new sci-fi comedy. In the future people can upload their...
Read more >
Upload | Where to Stream and Watch - Decider
Looking to watch Upload? Find out where Upload is streaming, if Upload is on Netflix, and get news and updates, on Decider.
Read more >
What is a good upload speed for streaming? - Restream
For 720p video at 30 or 60 frames per second, aim for an upload speed of roughly 3 to 4 Mbps. Twitch: For...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found