question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Google Cloud Storage outputs always considered changed

See original GitHub issue

DVC version: 0.21.3+b45e22.mod Running from source Also happens on official 0.21.3 release from homebrew

Going exactly by the tutorial: https://dvc.org/doc/user-guide/external-outputs and working with a GS bucket and cache.

Immediately after the dvc run executes, running dvc statusshows the output file on GS as changed. dvc repro will execute the command again, and it will still count as changed afterwards.

Seems like the problem is probably in RemoteGS.exists(self, path_infos)

The function is called with path_infos=[ {'scheme': 'gs', 'bucket': 'mybucket, 'key': 'data.txt'}]

However, since it is a cache remote, it lists the files in the cache bucket, whose names are etags and not the names of the files in the output. So exists returns False, and the output file is marked as changed due to not existing.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
efiopcommented, Dec 4, 2018

Patch is merged. I have found another bug in GS remote, preparing a patch for it right now. Will release new dvc version with all the patches tomorrow. Thank you for the feedback! 🙂

1reaction
efiopcommented, Dec 4, 2018

Hi @guysmoilov !

Great catch! Your analysis is spot on! I’m will send a patch for this shortly.

Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Efficiently check if any items have changed in a Google ...
Note: Google Cloud Storage objects cannot be modified. The only method to update an object is to replace it. Therefore the generation property ......
Read more >
Object Versioning | Cloud Storage
Cloud Storage retains a noncurrent object version each time you replace or delete a live object version, as long as you do not...
Read more >
Top gsutil command lines to get started on Google Cloud ...
Learn the best Google Cloud Storage features with these gsutil commands.
Read more >
US regional versus Multi-regional US buckets: trade-offs
For researchers with data in the US, Google Cloud (and now Terra) ... you can save storage costs in Terra by changing from...
Read more >
Cloud Storage Go Reference
Buckets ¶. A Google Cloud Storage bucket is a collection of objects. To work with a bucket, make a bucket handle: bkt :=...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found