GCP download very slow for slightly large files
See original GitHub issueProblem description
I am trying to download a slightly large file (1.1GB) and the attached code with smart_open
takes a long time (15m40s) while a gsutil cp
takes about 25s. The storage.blob
API of google is also quite fast (and comparable to gsutil).
Steps/code to reproduce the problem
Code used:
import time
import sys
from smart_open import open as cloud_open
gcs_uri = "<redacted file name>"
dl_path = "./test.pkl"
current_secs_func = lambda: int(round(time.time()))
chunk_size = 256 * 1024 * 1024 # 256M
count = 0
with cloud_open(gcs_uri, mode="rb") as cloud_fd: # Same slowness even with `transport_params={'min_part_size': chunk_size}`
with open(dl_path, mode="wb+") as local_fd:
print("Start time: ", current_secs_func())
sys.stdout.flush()
while True:
current = current_secs_func()
data = cloud_fd.read(chunk_size)
print("Read chunk [{}] of at most size [{}] from [{}] to [{}] in [{}] secs".format(count, chunk_size, gcs_uri, dl_path, current_secs_func() - current))
sys.stdout.flush()
if not data:
break
count += 1
current = current_secs_func()
local_fd.write(data)
print("Wrote chunk [{}] of at most size [{}] from [{}] to [{}] in [{}] secs".format(count, chunk_size, gcs_uri, dl_path, current_secs_func() - current))
sys.stdout.flush()
Nearly each chunk read above takes close to 230s. (Write to output file on local FS has sub-second latency).
Versions
Please provide the output of:
Python 3.7.7 (default, Apr 18 2020, 02:59:53)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform, sys, smart_open
>>> print(platform.platform())
Linux-5.4.0-1011-gcp-x86_64-with-Ubuntu-20.04-focal
>>> print("Python", sys.version)
Python 3.7.7 (default, Apr 18 2020, 02:59:53)
[GCC 9.3.0]
>>> print("smart_open", smart_open.__version__)
smart_open 3.0.0
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (7 by maintainers)
Top Results From Across the Web
Optimizing your Cloud Storage performance - Google Cloud
That means that for both uploads and downloads, Cloud Storage performance is at its best for larger requests of around 1MB in size....
Read more >Sometimes get super-slow download rates from Google Cloud ...
Usually we get great download speeds from Google Cloud Storage - I think ~10-25 megabytes per second is common. However, sometimes (and becoming ......
Read more >Optimize data transfer between Compute Engine and Cloud ...
Are you experiencing slow transfer speeds between your GCE VM and a Cloud Storage bucket? Then read on to learn how to maximize...
Read more >How to fix incosistent and slow Google Cloud Storage ...
A common problem I've seen in GCE is that due to gcloud clients having a heavy DNS dependency, that bursts of traffic are...
Read more >How can I improve download time from GCS for small files ...
If the gsutil cp command is anything like FTP, there is a lot of slow network stuff happening for each specific file transfer....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Yep, this is much slower than it should be. I remember running initial benchmarks during development and seeing numbers much lower than this. I’m not sure if something has changed in the code or if my memory is failing me / I ran improper benchmarks, but these numbers are definitely unacceptable. I can try to do some profiling soon to figure out where the bottlenecks are.
I have tried various options including using a transport, trying to read all 1.1GB at once without chunking etc. They are all in a similar ballpark and very slow compared to gsutil. Also initially tried with v2.1.0 and that was also taking similar times.