push: failed to push data to the cloud (GCS)
See original GitHub issueBug Report
Description
Trying to push data to remote storage on GCS but failed.
- Create a data registry, add data & pushed to the remote storage (on GCS) ~ 20 GB (success)
- Add more data (~ 100 GB), run ‘dvc commit && dvc push’ but failed
❯ dvc push -v
2021-06-28 22:21:47,392 DEBUG: Check for update is enabled.
2021-06-28 22:21:48,987 DEBUG: Preparing to upload data to 'gs://siim-covid19-data'
2021-06-28 22:21:48,987 DEBUG: Preparing to collect status from gs://siim-covid19-data
2021-06-28 22:21:48,989 DEBUG: Collecting information from local cache...
2021-06-28 22:21:49,079 DEBUG: Collecting information from remote cache...
2021-06-28 22:21:49,079 DEBUG: Querying 2 hashes via object_exists
2021-06-28 22:21:50,656 DEBUG: Querying 2 hashes via object_exists
2021-06-28 22:21:51,128 DEBUG: Matched '1943' indexed hashes
2021-06-28 22:21:51,350 DEBUG: Estimated remote size: 4096 files
2021-06-28 22:21:51,351 DEBUG: Querying '5630' hashes via traverse
2021-06-28 22:21:52,032 ERROR: unexpected error - list indices must be integers or slices, not str
------------------------------------------------------------
Traceback (most recent call last):
...
File ".../venv/lib/python3.9/site-packages/fsspec/asyn.py", line 24, in _runner
result[0] = await coro
File ".../venv/lib/python3.9/site-packages/gcsfs/core.py", line 943, in _find
par = o["name"]
TypeError: list indices must be integers or slices, not str
------------------------------------------------------------
2021-06-28 22:21:52,200 DEBUG: Version info for developers:
DVC version: 2.4.1 (pip)
---------------------------------
Platform: Python 3.9.4 on macOS-11.2.3-x86_64-i386-64bit
Supports: azure, gs, http, https
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s1s1
Caches: local
Remotes: gs
Workspace directory: apfs on /dev/disk1s1s1
Repo: dvc, git
Reproduce
- dvc init
- Copy images to
train/directory - dvc add
train/ - dvc remote add -d gcp-storage gs://STORAGE-NAME
- dvc push
- Copy more images to
train/directory - dvc commit
- dvc push
Expected
Updated data to sync with my Google Cloud Storage bucket
Environment information
Output of dvc doctor:
$ dvc doctor
DVC version: 2.4.1 (pip)
---------------------------------
Platform: Python 3.9.4 on macOS-11.2.3-x86_64-i386-64bit
Supports: azure, gs, http, https
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s1s1
Caches: local
Remotes: gs
Workspace directory: apfs on /dev/disk1s1s1
Repo: dvc, git
Issue Analytics
- State:
- Created 2 years ago
- Reactions:3
- Comments:14 (7 by maintainers)
Top Results From Across the Web
DVC tutorial - error pushing data to the cloud · Issue #1231
The new error is raised because you don't have write permissions for the dvc-share bucket. You are using dvc-share/classify as a remote s3 ......
Read more >Troubleshooting | Cloud Storage
Issue: I get an error when I attempt to make my data public. Solution: Make sure that you have the setIamPolicy permission for...
Read more >Data Version Control (dvc) cannot push to remote storage ...
DVC version: 1.11.10 ERROR: failed to push data to the cloud - config file error: no remote specified. But I set up a...
Read more >Troubleshooting | Data Version Control - DVC
or ERROR: failed to pull data from the cloud . The most common cause is changes pushed to Git without the corresponding data...
Read more >Incident Report - Google Cloud Service Health
Starting on 10 November 2022 at 00:04 PST customers of Google Cloud Storage (GCS) and Google BigQuery may have seen intermittent error ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

It took me a while, but I was able to trace the actual issue within ‘gcsfs’ (the underlying google storage client library). I also found a reproducer: https://github.com/dask/gcsfs/issues/393#issuecomment-871345871. I guess the naive workaround is just disabling prefix based search again, for google cloud since it apparently doesn’t work quite properly (or introducing
invalidate_cache()calls beforefind(), but I’d rather disable prefix-based completely until we are sure that it works). I pushed my changes to #6246, so you should be able to test them out.I wasn’t able to find an easy test case, so we need a manual confirmation that the patch I created works (#6246) for the bug mentioned here.
2.3.0