question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Replicate directory structure in bucket using gsutil cp with wildcards

See original GitHub issue

This is essentially a feature request for the question I posted on stackoverflow.

Let’s say I have the following folders in my bucket:

2011/03/11/a/b/c
2012/04/11/c
2013/04/11/f

When I copy directories from my bucket using:

$ gsutil cp -r gs://my-bycket .

it creates the folders (2011/03/11/a/b/c etc) on my local harddrive and then place the files under these folders (which is what I want). The problem is that I don’t want to copy ALL files in these folders so I use a wildcard like this instead:

gsutil cp -r "gs://my-bucket/**/name-????.gz" .

gsutil starts copying all the files correctly but it flattens out the folder structure and place each file directly in my current directory which is not what I want.

I’d like a flag that allows gsutil to create the directory structure for each file when doing this. Would this be feasible?

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:34
  • Comments:12

github_iconTop GitHub Comments

18reactions
leo-pcommented, May 22, 2020

Last time I checked neither features had been implemented (correct me if I’m wrong), but you don’t actually need it.

Using the -x flag from the gsutil rsync command, you can actually exclude all files that don’t match (instead of just the one that match). All you need to do is use a negative lookahead (?!...).

For instance, if you would like to match all JSON files, you could use a regular expression such as:

.*\.json$

Now you just need to add the negative lookahead:

^(?!.*\.json$).*

Combined with the -x flag, this will sync all JSON files in your gs bucket to a local directory:

gsutil rsync -r -x '^(?!.*\.json$).*' gs://mybucket mydir

Example

$ gs://mybucket
├── dir1
│   ├── 1.json
│   ├── 2.json
│   └── img.jpg
└── dir2
    ├── 3.json
    └── random.txt
$ mkdir mydir
$ gsutil rsync -r -x '^(?!.*\.json$).*' gs://mybucket mydir
$ mydir
├── dir1
│   ├── 1.json
│   └── 2.json
└── dir2
    └── 3.json
14reactions
jsonbrookscommented, Oct 29, 2019

Any update on this?

Read more comments on GitHub >

github_iconTop Results From Across the Web

cp - Copy files and objects | Cloud Storage - Google Cloud
The gsutil cp command allows you to copy data between your local file system and the cloud, ... Use the -r option to...
Read more >
Wildcard folder listing with gsutil - Stack Overflow
gsutil will only list objects when using the ** wildcard, meaning that unless there's an object at the path monitor in somebucket ,...
Read more >
cp.py - EECS: www-inst.eecs.berkeley.edu
If you want to copy an entire directory tree you need to use the -r option: gsutil cp -r dir gs://my-bucket If you...
Read more >
Copy list of files (-I flag) with gsutil preserving path - Server Fault
Either approach (enumerating the files using find or using a gsutil recursive (**) wildcard) produces a list of path names for the source...
Read more >
Top gsutil command lines to get started on Google Cloud ...
The bucket now has the virtual folder /img . ... When moving large number of files, adding the -m flag to cp will...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found