question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

repro: crash if 'output' file does not exist.

See original GitHub issue

Bug Report

Description

If the dvc repro command is given and the output file does not exist, then it crashes with python exception raised.

In my test, I set up dvc.yaml stage file with a ‘cmd’ that does nothing, and set ‘persist’ false, then the output files are deleted, and no output file will be generated. I believe this should result in the dvc.lock file being updated by erasing the hashcode, to indicate that the file does not exist, but maybe there is some other convention. I am not sure but maybe the hashcodes remain the same and then it is checked to see if it exists with status.

In any case, I don’t think it is good for the program to crash but I’d like to know if that is the convention.

Error dump:

2021-09-04 16:48:54,499 ERROR: failed to reproduce 'resources\WI_Ozaukee_20201103\dvc\precheck\dvc.yaml': output 's3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/cache/all_archives_info.json' does not exist
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\repo\reproduce.py", line 196, in _reproduce_stages
    ret = _reproduce_stage(stage, **kwargs)
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\repo\reproduce.py", line 39, in _reproduce_stage
    stage = stage.reproduce(**kwargs)
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\funcy\decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\stage\decorators.py", line 36, in rwlocked
    return call()
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\funcy\decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\stage\__init__.py", line 427, in reproduce
    self.run(**kwargs)
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\funcy\decorators.py", line 45, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\stage\decorators.py", line 36, in rwlocked
    return call()
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\funcy\decorators.py", line 66, in __call__
    return self._func(*self._args, **self._kwargs)
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\stage\__init__.py", line 546, in run
    self.save(allow_missing=allow_missing)
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\stage\__init__.py", line 457, in save
    self.save_outs(allow_missing=allow_missing)
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\stage\__init__.py", line 477, in save_outs
    out.save()
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\output.py", line 520, in save
    raise self.DoesNotExistError(self)
dvc.output.OutputDoesNotExistError: output 's3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/cache/all_archives_info.json' does not exist

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\main.py", line 55, in main
    ret = cmd.do_run()
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\command\base.py", line 45, in do_run
    return self.run()
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\command\repro.py", line 12, in run
    stages = self.repo.reproduce(**self._repro_kwargs)
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\repo\__init__.py", line 49, in wrapper
    return f(repo, *args, **kwargs)
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\repo\scm_context.py", line 14, in run
    return method(repo, *args, **kw)
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\repo\reproduce.py", line 135, in reproduce
    return _reproduce_stages(self.index.graph, list(stages), **kwargs)
  File "c:\users\raylu\appdata\local\programs\python\python37\lib\site-packages\dvc\repo\reproduce.py", line 213, in _reproduce_stages
    raise ReproductionError(stage.relpath) from exc
dvc.exceptions.ReproductionError: failed to reproduce 'resources\WI_Ozaukee_20201103\dvc\precheck\dvc.yaml'
------------------------------------------------------------
2021-09-04 16:48:54,543 DEBUG: Analytics is disabled.

The dvc status command does not similarly fail, it does report the fact that the expected outputs are deleted.

dvc status -R -v resources\WI_Ozaukee_20201103\dvc
2021-09-04 16:51:48,827 DEBUG: Checking if stage 'resources\WI_Ozaukee_20201103\dvc' is in 'dvc.yaml'
resources\WI_Ozaukee_20201103\dvc\precheck\dvc.yaml:precheck:
        changed outs:
                deleted:            s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/cache/all_archives_info.json
                deleted:            s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/cache/bia_bif.csv
                deleted:            s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/reports/precheck_report.md
                deleted:            s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/reports/precheck_report.html
        changed command

Reproduce

I think this might be possible to reproduce the error by simply using a do-nothing command as the ‘cmd’ in the stage. In our test, we called our program but just got the --version response, which does not build the output file.

We found that the dvc.lock file was unchanged, which might imply by inspecting it that the build was successful. But there may be some other way this is indicated. I don’t know what ‘status’ does when

My dvc.yaml stage file:

# dvc.yaml for precheck stage.
# path of this file: C:\Users\raylu\Documents\Github\audit-engine\resources\(job name)\dvc\precheck\dvc.yaml

stages:
    precheck:
        cmd:    python main.py -i JOB_WI_Ozaukee_20201103.csv --op version
        wdir:   C:\Users\raylu\Documents\Github\audit-engine
        deps:   
            - s3://us-east-1-audit-engine-election-data/US/WI/US_WI_Ozaukee_General_20201103/WI_Ozaukee_20201103_BIA_0.zip
            - s3://us-east-1-audit-engine-election-data/US/WI/US_WI_Ozaukee_General_20201103/WI_Ozaukee_20201103_BIA_1.zip
            - s3://us-east-1-audit-engine-election-data/US/WI/US_WI_Ozaukee_General_20201103/WI_Ozaukee_20201103_BIA_2.zip
        outs:
            - s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/cache/all_archives_info.json:
                cache: false           
#                persist: true
            - s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/cache/bia_bif.csv:
                cache: false          
#                persist: true
            - s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/reports/precheck_report.md:
                cache: false          
#                persist: true
            - s3://us-east-1-audit-engine-jobs/US/WI/US_WI_Ozaukee_General_20201103/reports/precheck_report.html:
                cache: false          
#                persist: true


Expected

The program should not crash, and I believe dvc.lock should be appropriately updated.

Environment information

Output of dvc doctor:

$ dvc doctor

DVC version: 2.6.4 (pip)
---------------------------------
Platform: Python 3.7.6 on Windows-10-10.0.19041-SP0
Supports:
        http (requests = 2.24.0),
        https (requests = 2.24.0),
        s3 (s3fs = 2021.8.0, boto3 = 1.17.106)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: NTFS on C:\
Repo: dvc (no_scm)

Additional Information (if any):

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
raylutzcommented, Dec 4, 2021

We have decided not to use DVC and have implemented our own similar functionality. Thanks for your time.

2reactions
raylutzcommented, Sep 13, 2021

Yes, I believe that by not stopping with an exception, that will allow the output files to be considered optional. If they are not ever used in a later stage, then the production is not really an error. Now if the file is not optional, and is used in a later stage, then if you update the dvc.lock file to show that it is missing, then the subsequent stage should complain that it does not have a dependency, and thus the required/optional feature is already built in, as long as

  1. When a build of a stage does not produce an output, it completes writing the dvc.lock file, and
  2. When a file is missing, then there should be no hashcode/etag. Then when a subsequent stage is looking for it, then is simply will not find it, and I think that case is probably already dealt with, but it should be checked.

So by fixing this, you will feed two birds with one scone.

Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

First run of DVC - getting a "failed to reproduce" error
When I run dvc repro -f myfile.dvc, I got an error …failed to reproduce 'myfile.dvc': output 'xxx' does not exist.
Read more >
When an application crash without output an error, is there a ...
Depends on the application. Different applications have different logging systems; there's no one central log that contains all the output from all the ......
Read more >
c++ - Program only crashes as release build -- how to debug?
When I attempt to run this program inside of Visual Studio, it doesn't crash. Same goes when launching from WinDbg.exe. The crash only...
Read more >
Re: NAMD crashes when writing restart file and dcdfreq is 1
backup file doesn't exist, so if you removed it you would see an error >> message on the first or second output step....
Read more >
Troubleshooting - ImageJ Wiki
If the first method does not work, and you can reproduce the hang: ... If ImageJ crashes—i.e., the program suddenly terminates, with or...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found