question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

gsutil cp hangs on many small files when running in parallel

See original GitHub issue

I have a GCS bucket with millions of small files in different folders. When I run:

$ gsutil -m cp -r gs://my-bucket .

The process will eventually hang before completion, sometimes after 5 minutes and sometimes after several hours. This seems to be is 100% reproducible. I’m using version 4.27 but this has happened in older versions as well. As a workaround have I to use:

$ gsutil cp -r gs://my-bucket .

which works but it takes several days to download everything so it’s not optimal.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:9
  • Comments:24

github_iconTop GitHub Comments

41reactions
obriecommented, May 25, 2020

For what it’s worth, I found that using only threads for parallelization (and not child processes) appears to avoid the underlying deadlock here. e.g. -o GSUtil:parallel_process_count=1 -o GSUtil:parallel_thread_count=24

15reactions
vkaul11commented, Aug 21, 2020

I am still facing the issue, it reaches 99% of copied files and then terminates.

Read more comments on GitHub >

github_iconTop Results From Across the Web

gsutil hangs on large file - google cloud platform
Here are some workaround that you can try : You can try uploading it across multiple folders on a single bucket since there...
Read more >
cp - Copy files and objects | Cloud Storage - Google Cloud
The gsutil cp command allows you to copy data between your local file system ... If you have a large number of files...
Read more >
Easily parallelize large scale data copies into Google Cloud ...
This command runs pretty quick, 4 million lines takes about 3 seconds. The —-number=r means we are round robin'ing the file names into...
Read more >
Optimize data transfer between Compute Engine and Cloud ...
Useful for transferring a large number of files in parallel, not the upload ... time gsutil cp temp_30GB_file gs://doit-speed-test-bucket/ ...
Read more >
gsutil cp – Copy and Move Files on Google Cloud
Learn how to use the gsutil cp command to copy files from local to ... -m option to upload large number of files...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found