question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

If I try to use smart open to seek/read parts of an s3 file, I get NotImplementedError: seek other than offset=0 not implemented yet.

Arbitrary seeking, especially when the seek was specified relative to the beginning of the file (seek(..., whence=0), should be possible through the Range HTTP header

>>> import boto
>>> s3 = boto.connect_s3()
>>> bucket = s3.lookup('bucket')
>>> key = bucket.lookup('key')
>>> parts = key.get_contents_as_string(headers={'Range' : 'bytes=12-24'})

seek could establish a pointer to the starting byte and subsequent reads would define the end.

Are there any technical limitation or design restrictions that would prevent this?

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
mpenkovcommented, Dec 10, 2017

@menshikh-iv I think this is done. We can seek S3 files now.

2reactions
perrygeocommented, Nov 27, 2015

get_contents_as_string should respect the HTTP Range header but it doesn’t always behave that way. In particular, I found that the first call read the entire contents while subsequent calls (with the exact same args and kwargs) pulled in only the requested bytes. I believe this to be a bug in boto. Unfortunately I couldn’t find another way to implement this in boto2.

However, switching to boto3 I was able to put together a working s3 reader using the object abstractions. I wrapped it in a file handle interface that does arbitrary seeks and reads: https://gist.github.com/perrygeo/9239b9ab64731cacbb35#file-s3reader-py . It’s very effective and allowed me to read TIF tags off 2000 x 1.1 GB files stored on S3 in just a few minutes.

I haven’t yet considered how something like this could integrate with smart_open but I figured it might be useful.

Read more comments on GitHub >

github_iconTop Results From Across the Web

S3: How to do a partial read / seek without downloading the ...
On a Unix system I can use head to preview the first few lines of a file, no matter how large it is,...
Read more >
s3 open seek operation try read rest of file into buffer ... - GitHub
Here makes API call to fetch rest of file into buffer when calling seek, which makes seek very slow. The API call may...
Read more >
Random-Access (Seekable) Streams for Amazon S3 in C#
Lucky for us, S3 is one of those HTTP services that does support HTTP's method for “seeking” by using Range headers (which I've...
Read more >
Working with really large objects in S3 - alexwlchan
Implementing the seek() method. When we tried to load a ZIP file the first time, we discovered that somewhere the zipfile module is...
Read more >
Performance Guidelines for Amazon S3
When building applications that upload and retrieve objects from Amazon S3, follow our best practices guidelines to optimize performance.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found