question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reading S3 files becomes slow after 1.5.4

See original GitHub issue

As mentioned earlier in #74, it appears that the reading speed is very slow after 1.5.4.

$ pyvenv-3.4 env
$ source env/bin/activate
$ pip install smart_open==1.5.3 tqdm ipython
$ ipython
from tqdm import tqdm
from smart_open import smart_open
for _ in tqdm(smart_open('s3://xxxxx', 'rb')):
    pass

2868923it [00:53, 53888.94it/s]

$ pyvenv-3.4 env
$ source env/bin/activate
$ pip install smart_open==1.5.4 tqdm ipython
$ ipython
from tqdm import tqdm
from smart_open import smart_open
for _ in tqdm(smart_open('s3://xxxxx', 'rb')):
    pass

8401it [00:18, 442.64it/s] (too slow so I could not wait for it to finish.)

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:26 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
mpenkovcommented, Dec 6, 2017

@otamachan Thank you for your suggestion! I had a closer look at your implementation and finally realized what the remaining problem was. It isn’t necessary to go back to boto to achieve the same performance: we can do the same thing with the newer boto3.

https://github.com/RaRe-Technologies/smart_open/pull/157

Thank you for pointing me the right way. どうもありがとうございました!

2reactions
menshikh-ivcommented, Dec 11, 2017

@otamachan I’ll release 1.5.6 but slightly later (after setup integration testing contour).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot slow or inconsistent speeds when downloading ...
Check the following to identify and mitigate what might be contributing to slow or inconsistent speeds when downloading or uploading to Amazon ...
Read more >
AWS Lambda function extremely slow to retrieve S3 file
What I would suggest is to use ProfileCredentialsProvider and to cache S3 client instance between Lambda function executions:
Read more >
Extremly slow write to S3 bucket with xarray.Dataset.to_zarr
I am running into extremely slow runtime when writing xarray.Dataset to the S3 bucket in Zarr format. I am able to reproduce the...
Read more >
S3 Archiving (Self-Install) - Humio Documentation
Archiving works by running a periodic job inside all Humio nodes, which looks for new, unarchived segment files. The segment files are read...
Read more >
Can I use S3 bucket for training data? - Google Groups
That said, I suspect this would be really slow, although I am just guessing. If your dataset is 1.5TB and you do say...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found