question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

push: failed to push data to the cloud (GCS)

See original GitHub issue

Bug Report

Description

Trying to push data to remote storage on GCS but failed.

  • Create a data registry, add data & pushed to the remote storage (on GCS) ~ 20 GB (success)
  • Add more data (~ 100 GB), run ‘dvc commit && dvc push’ but failed
❯ dvc push -v
2021-06-28 22:21:47,392 DEBUG: Check for update is enabled.
2021-06-28 22:21:48,987 DEBUG: Preparing to upload data to 'gs://siim-covid19-data'
2021-06-28 22:21:48,987 DEBUG: Preparing to collect status from gs://siim-covid19-data
2021-06-28 22:21:48,989 DEBUG: Collecting information from local cache...
2021-06-28 22:21:49,079 DEBUG: Collecting information from remote cache...                                                                                                                                                                                                
2021-06-28 22:21:49,079 DEBUG: Querying 2 hashes via object_exists
2021-06-28 22:21:50,656 DEBUG: Querying 2 hashes via object_exists                                                                                                                                                                                                        
2021-06-28 22:21:51,128 DEBUG: Matched '1943' indexed hashes                                                                                                                                                                                                              
2021-06-28 22:21:51,350 DEBUG: Estimated remote size: 4096 files                                                                                                                                                                                                          
2021-06-28 22:21:51,351 DEBUG: Querying '5630' hashes via traverse                                                                                                                                                                                                        
2021-06-28 22:21:52,032 ERROR: unexpected error - list indices must be integers or slices, not str    
------------------------------------------------------------
Traceback (most recent call last):
...
File ".../venv/lib/python3.9/site-packages/fsspec/asyn.py", line 24, in _runner
    result[0] = await coro
  File ".../venv/lib/python3.9/site-packages/gcsfs/core.py", line 943, in _find
    par = o["name"]
TypeError: list indices must be integers or slices, not str
------------------------------------------------------------
2021-06-28 22:21:52,200 DEBUG: Version info for developers:
DVC version: 2.4.1 (pip)
---------------------------------
Platform: Python 3.9.4 on macOS-11.2.3-x86_64-i386-64bit
Supports: azure, gs, http, https
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s1s1
Caches: local
Remotes: gs
Workspace directory: apfs on /dev/disk1s1s1
Repo: dvc, git

Reproduce

  1. dvc init
  2. Copy images to train/ directory
  3. dvc add train/
  4. dvc remote add -d gcp-storage gs://STORAGE-NAME
  5. dvc push
  6. Copy more images to train/ directory
  7. dvc commit
  8. dvc push

Expected

Updated data to sync with my Google Cloud Storage bucket

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.4.1 (pip)
---------------------------------
Platform: Python 3.9.4 on macOS-11.2.3-x86_64-i386-64bit
Supports: azure, gs, http, https
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk1s1s1
Caches: local
Remotes: gs
Workspace directory: apfs on /dev/disk1s1s1
Repo: dvc, git

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:3
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

5reactions
isidenticalcommented, Jun 30, 2021

It took me a while, but I was able to trace the actual issue within ‘gcsfs’ (the underlying google storage client library). I also found a reproducer: https://github.com/dask/gcsfs/issues/393#issuecomment-871345871. I guess the naive workaround is just disabling prefix based search again, for google cloud since it apparently doesn’t work quite properly (or introducing invalidate_cache() calls before find(), but I’d rather disable prefix-based completely until we are sure that it works). I pushed my changes to #6246, so you should be able to test them out.

4reactions
isidenticalcommented, Jun 29, 2021

Excellent, any ETA when you might have a brew version that has a hotfix for this?

I wasn’t able to find an easy test case, so we need a manual confirmation that the patch I created works (#6246) for the bug mentioned here.

By the way any insights as to what might have been the last working version in case I need to downgrade?

2.3.0

Read more comments on GitHub >

github_iconTop Results From Across the Web

DVC tutorial - error pushing data to the cloud · Issue #1231
The new error is raised because you don't have write permissions for the dvc-share bucket. You are using dvc-share/classify as a remote s3 ......
Read more >
Troubleshooting | Cloud Storage
Issue: I get an error when I attempt to make my data public. Solution: Make sure that you have the setIamPolicy permission for...
Read more >
Data Version Control (dvc) cannot push to remote storage ...
DVC version: 1.11.10 ERROR: failed to push data to the cloud - config file error: no remote specified. But I set up a...
Read more >
Troubleshooting | Data Version Control - DVC
or ERROR: failed to pull data from the cloud . The most common cause is changes pushed to Git without the corresponding data...
Read more >
Incident Report - Google Cloud Service Health
Starting on 10 November 2022 at 00:04 PST customers of Google Cloud Storage (GCS) and Google BigQuery may have seen intermittent error ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found