Google Cloud Storage outputs always considered changed
See original GitHub issueDVC version: 0.21.3+b45e22.mod Running from source Also happens on official 0.21.3 release from homebrew
Going exactly by the tutorial: https://dvc.org/doc/user-guide/external-outputs and working with a GS bucket and cache.
Immediately after the dvc run
executes, running dvc status
shows the output file on GS as changed. dvc repro
will execute the command again, and it will still count as changed afterwards.
Seems like the problem is probably in RemoteGS.exists(self, path_infos)
The function is called with path_infos=[ {'scheme': 'gs', 'bucket': 'mybucket, 'key': 'data.txt'}]
However, since it is a cache remote, it lists the files in the cache bucket, whose names are etags and not the names of the files in the output. So exists returns False, and the output file is marked as changed due to not existing.
Issue Analytics
- State:
- Created 5 years ago
- Comments:7 (7 by maintainers)
Patch is merged. I have found another bug in GS remote, preparing a patch for it right now. Will release new dvc version with all the patches tomorrow. Thank you for the feedback! 🙂
Hi @guysmoilov !
Great catch! Your analysis is spot on! I’m will send a patch for this shortly.
Thank you!