Reading S3 files becomes slow after 1.5.4
See original GitHub issueAs mentioned earlier in #74, it appears that the reading speed is very slow after 1.5.4.
$ pyvenv-3.4 env
$ source env/bin/activate
$ pip install smart_open==1.5.3 tqdm ipython
$ ipython
from tqdm import tqdm
from smart_open import smart_open
for _ in tqdm(smart_open('s3://xxxxx', 'rb')):
pass
2868923it [00:53, 53888.94it/s]
$ pyvenv-3.4 env
$ source env/bin/activate
$ pip install smart_open==1.5.4 tqdm ipython
$ ipython
from tqdm import tqdm
from smart_open import smart_open
for _ in tqdm(smart_open('s3://xxxxx', 'rb')):
pass
8401it [00:18, 442.64it/s] (too slow so I could not wait for it to finish.)
Issue Analytics
- State:
- Created 6 years ago
- Comments:26 (4 by maintainers)
Top Results From Across the Web
Troubleshoot slow or inconsistent speeds when downloading ...
Check the following to identify and mitigate what might be contributing to slow or inconsistent speeds when downloading or uploading to Amazon ...
Read more >AWS Lambda function extremely slow to retrieve S3 file
What I would suggest is to use ProfileCredentialsProvider and to cache S3 client instance between Lambda function executions:
Read more >Extremly slow write to S3 bucket with xarray.Dataset.to_zarr
I am running into extremely slow runtime when writing xarray.Dataset to the S3 bucket in Zarr format. I am able to reproduce the...
Read more >S3 Archiving (Self-Install) - Humio Documentation
Archiving works by running a periodic job inside all Humio nodes, which looks for new, unarchived segment files. The segment files are read...
Read more >Can I use S3 bucket for training data? - Google Groups
That said, I suspect this would be really slow, although I am just guessing. If your dataset is 1.5TB and you do say...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@otamachan Thank you for your suggestion! I had a closer look at your implementation and finally realized what the remaining problem was. It isn’t necessary to go back to boto to achieve the same performance: we can do the same thing with the newer boto3.
https://github.com/RaRe-Technologies/smart_open/pull/157
Thank you for pointing me the right way. どうもありがとうございました!
@otamachan I’ll release 1.5.6 but slightly later (after setup integration testing contour).