repro: crash if 'output' file does not exist.
See original GitHub issueBug Report
Description
If the dvc repro command is given and the output file does not exist, then it crashes with python exception raised.
In my test, I set up dvc.yaml stage file with a ‘cmd’ that does nothing, and set ‘persist’ false, then the output files are deleted, and no output file will be generated. I believe this should result in the dvc.lock file being updated by erasing the hashcode, to indicate that the file does not exist, but maybe there is some other convention. I am not sure but maybe the hashcodes remain the same and then it is checked to see if it exists with status.
In any case, I don’t think it is good for the program to crash but I’d like to know if that is the convention.
Error dump:
2021-09-04 16:48:54,499 ERROR: failed to reproduce 'resources\WI_Ozaukee_20201103\dvc\precheck\dvc.yaml': output 's3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/cache/all_archives_info.json' does not exist
------------------------------------------------------------
Traceback (most recent call last):
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\repo\reproduce.py", line 196, in _reproduce_stages
ret = _reproduce_stage(stage, **kwargs)
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\repo\reproduce.py", line 39, in _reproduce_stage
stage = stage.reproduce(**kwargs)
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\funcy\decorators.py", line 45, in wrapper
return deco(call, *dargs, **dkwargs)
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\stage\decorators.py", line 36, in rwlocked
return call()
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\funcy\decorators.py", line 66, in __call__
return self._func(*self._args, **self._kwargs)
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\stage\__init__.py", line 427, in reproduce
self.run(**kwargs)
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\funcy\decorators.py", line 45, in wrapper
return deco(call, *dargs, **dkwargs)
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\stage\decorators.py", line 36, in rwlocked
return call()
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\funcy\decorators.py", line 66, in __call__
return self._func(*self._args, **self._kwargs)
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\stage\__init__.py", line 546, in run
self.save(allow_missing=allow_missing)
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\stage\__init__.py", line 457, in save
self.save_outs(allow_missing=allow_missing)
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\stage\__init__.py", line 477, in save_outs
out.save()
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\output.py", line 520, in save
raise self.DoesNotExistError(self)
dvc.output.OutputDoesNotExistError: output 's3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/cache/all_archives_info.json' does not exist
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\main.py", line 55, in main
ret = cmd.do_run()
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\command\base.py", line 45, in do_run
return self.run()
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\command\repro.py", line 12, in run
stages = self.repo.reproduce(**self._repro_kwargs)
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\repo\__init__.py", line 49, in wrapper
return f(repo, *args, **kwargs)
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\repo\scm_context.py", line 14, in run
return method(repo, *args, **kw)
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\repo\reproduce.py", line 135, in reproduce
return _reproduce_stages(self.index.graph, list(stages), **kwargs)
File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\repo\reproduce.py", line 213, in _reproduce_stages
raise ReproductionError(stage.relpath) from exc
dvc.exceptions.ReproductionError: failed to reproduce 'resources\WI_Ozaukee_20201103\dvc\precheck\dvc.yaml'
------------------------------------------------------------
2021-09-04 16:48:54,543 DEBUG: Analytics is disabled.
The dvc status command does not similarly fail, it does report the fact that the expected outputs are deleted.
dvc status -R -v resources\WI_Ozaukee_20201103\dvc
2021-09-04 16:51:48,827 DEBUG: Checking if stage 'resources\WI_Ozaukee_20201103\dvc' is in 'dvc.yaml'
resources\WI_Ozaukee_20201103\dvc\precheck\dvc.yaml:precheck:
changed outs:
deleted: s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/cache/all_archives_info.json
deleted: s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/cache/bia_bif.csv
deleted: s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/reports/precheck_report.md
deleted: s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/reports/precheck_report.html
changed command
Reproduce
I think this might be possible to reproduce the error by simply using a do-nothing command as the ‘cmd’ in the stage. In our test, we called our program but just got the --version response, which does not build the output file.
We found that the dvc.lock file was unchanged, which might imply by inspecting it that the build was successful. But there may be some other way this is indicated. I don’t know what ‘status’ does when
My dvc.yaml stage file:
# dvc.yaml for precheck stage.
# path of this file: C:\Users\raylu\Documents\Github\audit-engine\resources\(job name)\dvc\precheck\dvc.yaml
stages:
precheck:
cmd: python main.py -i JOB_WI_Ozaukee_20201103.csv --op version
wdir: C:\Users\raylu\Documents\Github\audit-engine
deps:
- s3://us-east-1-audit-engine-election-data/US/WI/US_WI_Ozaukee_General_20201103/WI_Ozaukee_20201103_BIA_0.zip
- s3://us-east-1-audit-engine-election-data/US/WI/US_WI_Ozaukee_General_20201103/WI_Ozaukee_20201103_BIA_1.zip
- s3://us-east-1-audit-engine-election-data/US/WI/US_WI_Ozaukee_General_20201103/WI_Ozaukee_20201103_BIA_2.zip
outs:
- s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/cache/all_archives_info.json:
cache: false
# persist: true
- s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/cache/bia_bif.csv:
cache: false
# persist: true
- s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/reports/precheck_report.md:
cache: false
# persist: true
- s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/reports/precheck_report.html:
cache: false
# persist: true
Expected
The program should not crash, and I believe dvc.lock should be appropriately updated.
Environment information
Output of dvc doctor:
$ dvc doctor
DVC version: 2.6.4 (pip)
---------------------------------
Platform: Python 3.7.6 on Windows-10-10.0.19041-SP0
Supports:
http (requests = 2.24.0),
https (requests = 2.24.0),
s3 (s3fs = 2021.8.0, boto3 = 1.17.106)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: NTFS on C:\
Repo: dvc (no_scm)
Additional Information (if any):
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (5 by maintainers)

Top Related StackOverflow Question
We have decided not to use DVC and have implemented our own similar functionality. Thanks for your time.
Yes, I believe that by not stopping with an exception, that will allow the output files to be considered optional. If they are not ever used in a later stage, then the production is not really an error. Now if the file is not optional, and is used in a later stage, then if you update the dvc.lock file to show that it is missing, then the subsequent stage should complain that it does not have a dependency, and thus the required/optional feature is already built in, as long as
So by fixing this, you will feed two birds with one scone.
Thanks!