Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

gsutil cp hangs on many small files when running in parallel

See original GitHub issue

I have a GCS bucket with millions of small files in different folders. When I run:

$ gsutil -m cp -r gs://my-bucket .

The process will eventually hang before completion, sometimes after 5 minutes and sometimes after several hours. This seems to be is 100% reproducible. I’m using version 4.27 but this has happened in older versions as well. As a workaround have I to use:

$ gsutil cp -r gs://my-bucket .

which works but it takes several days to download everything so it’s not optimal.

Issue Analytics

State:
Created 6 years ago
Reactions:9
Comments:24

Top GitHub Comments

41reactions

obriecommented, May 25, 2020

For what it’s worth, I found that using only threads for parallelization (and not child processes) appears to avoid the underlying deadlock here. e.g. -o GSUtil:parallel_process_count=1 -o GSUtil:parallel_thread_count=24

15reactions

vkaul11commented, Aug 21, 2020

I am still facing the issue, it reaches 99% of copied files and then terminates.

Read more comments on GitHub >

Top Results From Across the Web

gsutil hangs on large file - google cloud platform

Here are some workaround that you can try : You can try uploading it across multiple folders on a single bucket since there...

cp - Copy files and objects | Cloud Storage - Google Cloud

The gsutil cp command allows you to copy data between your local file system ... If you have a large number of files...

Easily parallelize large scale data copies into Google Cloud ...

This command runs pretty quick, 4 million lines takes about 3 seconds. The —-number=r means we are round robin'ing the file names into...

Optimize data transfer between Compute Engine and Cloud ...

Useful for transferring a large number of files in parallel, not the upload ... time gsutil cp temp_30GB_file gs://doit-speed-test-bucket/ ...

gsutil cp – Copy and Move Files on Google Cloud

Learn how to use the gsutil cp command to copy files from local to ... -m option to upload large number of files...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Replicate directory structure in bucket using gsutil cp with wildcards

gsutil rsync doesn't recognize some directories as suitable for transfer