Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reading S3 files becomes slow after 1.5.4

See original GitHub issue

As mentioned earlier in #74, it appears that the reading speed is very slow after 1.5.4.

$ pyvenv-3.4 env
$ source env/bin/activate
$ pip install smart_open==1.5.3 tqdm ipython
$ ipython

from tqdm import tqdm
from smart_open import smart_open
for _ in tqdm(smart_open('s3://xxxxx', 'rb')):
    pass

2868923it [00:53, 53888.94it/s]

$ pyvenv-3.4 env
$ source env/bin/activate
$ pip install smart_open==1.5.4 tqdm ipython
$ ipython

from tqdm import tqdm
from smart_open import smart_open
for _ in tqdm(smart_open('s3://xxxxx', 'rb')):
    pass

8401it [00:18, 442.64it/s] (too slow so I could not wait for it to finish.)

Issue Analytics

State:
Created 6 years ago
Comments:26 (4 by maintainers)

Top GitHub Comments

3reactions

mpenkovcommented, Dec 6, 2017

@otamachan Thank you for your suggestion! I had a closer look at your implementation and finally realized what the remaining problem was. It isn’t necessary to go back to boto to achieve the same performance: we can do the same thing with the newer boto3.

https://github.com/RaRe-Technologies/smart_open/pull/157

Thank you for pointing me the right way. どうもありがとうございました！

2reactions

menshikh-ivcommented, Dec 11, 2017

@otamachan I’ll release 1.5.6 but slightly later (after setup integration testing contour).