question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add ability to check integrity of uploaded object

See original GitHub issue

Hi Folks,

I’ve been reading https://aws.amazon.com/premiumsupport/knowledge-center/data-integrity-s3/ and I’m wondering whether s3fs could/does already support this functionality?

I’ve got a situation where I request a file from a data provider and in a separate request I can get the checksum of the file.

What I’d like to do is use s3fs to write the file and verify the integrity of the uploaded object by providing the checksum the data provider gives me, something like:

r = requests.get("a-url", stream=True)
if r.status_code == 200:
    try:
        with s3fs.open(f"<bucket-name>/object.type", "wb", md5checksum="ACHECKSUM") as file_out:
            shutil.copyfileobj(r.raw, file_out)
    except IntegrityError as ex:
        # Tidy up here

I’ve tried using the .checksum() function, but this doesn’t return me the correct checksum, I can download the file and get the correct checksum with hashlib.md5().hexdigest(), so I know it’s uploaded fine…

Appreciate any pointers you might have on this!

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, Feb 26, 2021

In gcsfs, this is handled explicitly. See, for example, the nice refactor by @nbren12, grouping the integrity checkers in https://github.com/dask/gcsfs/blob/main/gcsfs/checkers.py . That code could be upstreamed to fsspec and made available to other libraries such as this one. However, the case there is a bit different: it checks that the hash of the data being sent is equal to the has reported back, rather than a user-provided one. It doesn’t seem like a bad idea, though!

0reactions
martindurantcommented, Feb 26, 2021

I wonder if there is an abstraction suitable for fsspec.

The code that takes the uploaded data and forms a checksum would be the same, but how that checksum is used would be different.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Check the integrity of data in Amazon S3 with additional ...
Step 3: Verify checksum · 3.1. Select the uploaded file by selecting the filename. · 3.2 — Locate the checksum value. Navigate down...
Read more >
Check the integrity of an object uploaded to Amazon S3
Verify the integrity of the uploaded object. When you use PutObject to upload objects to Amazon S3, pass the Content-MD5 value as a...
Read more >
Integrity check with S3 upload - Stack Overflow
Below are the steps that i took, but it is failing irrespective. What am i missing here? openssl md5 -binary test-md5.txt | base64 ......
Read more >
Does @aws-sdk/lib-storage Upload - Multipart ... - GitHub
The Documentation makes no mention of data integrity. ... Upload - Multipart upload verify integrity of uploaded objects? #3920.
Read more >
S3: How to confirm integrity of large object/s? : r/aws - Reddit
Without that you'll never calculate the etag. It's typically MD5 if you upload it all in one go, but if the client uploads...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found