question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

S3FileSystem.checksum always returns cached value

See original GitHub issue

Hi,

Encountered this unexpected behavior when using the checksum method to monitor changes in a file in a long-running app.

Description

S3FileSystem.checksum returns the same value for an object, even after the object has changed

Reproduce steps

  1. Create object s3://MY_BUCKET/MY_OBJECT on s3
  2. Run
import s3fs
fs=s3fs.S3FileSystem()
fs.checksum("s3://MY_BUCKET/foo")
  1. Overwrite the object with new contents (not using fs, but using e.g. the aws cli)
  2. In the same Python process as above, run fs.checksum("s3://MY_BUCKET/MY_OBJECT")

The results from 2 and 4 are identical, even though the file has changed.

Cause

It seems that checksum uses the default implementation of fsspec.AbstractFileSystem, which in turn creates the checksum from the output of fs.ls(detail=True). Since in the S3FileSystem implementation this gets passed a default argument refresh=False, the checksum is always computed on cached results.

Proposed solution

checksum should be implemented in S3FileSystem, in such a way that it always checks the actual state of the object, or at least provides the option to do so.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:8 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
martindurantcommented, May 5, 2021

I don’t think a directory has a checksum, does it? It what manner are you changing it between calls?

0reactions
martindurantcommented, May 18, 2021

Right, if the files change via some other process, s3fs doesn’t know about this. Please do feel free to propose changes to the documentation to clarify things.

Read more comments on GitHub >

github_iconTop Results From Across the Web

S3FileSystem.checksum always returns cached value #293
It seems that checksum uses the default implementation of fsspec.AbstractFileSystem , which in turn creates the checksum from the output of fs.
Read more >
S3Fs Documentation
Open S3 key as a file. Data is only loaded and cached on demand. Parameters s3. [S3FileSystem] botocore connection path. [string] ...
Read more >
fsspec Documentation - Read the Docs
File-systems are naturally like dict-like key-value mappings: each (string) ... If False, this cache never returns items, but always reports ...
Read more >
s3backer — FUSE-based single file backing store via Amazon ...
s3backer maintains an internal block MD5 checksum cache, which enables automatic detection and rejection of `stale' blocks returned by GET operations. This ...
Read more >
android - How to avoid from always loading cached app data ...
Generate a checksum for the current local zip file. · Upload zip data to Google Drive App Folder for the first time. ·...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found