question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`dvc exp run --run-all` results in `ERROR: unexpected error`

See original GitHub issue

Bug Report

Description

After queuing up a number of experiments that I can see with dvc exp show:

Experiment Created State eval_loss …
workspace - - 0.051839 …
longdocs2 May 16, 2022 - 0.051839 …
β”œβ”€β”€ a9a2a52 May 16, 2022 Queued - …
β”œβ”€β”€ a9362c3 May 16, 2022 Queued - …
β”œβ”€β”€ a412093 May 16, 2022 Queued - …
β”œβ”€β”€ ceebf27 May 16, 2022 Queued - …
β”œβ”€β”€ e09f285 May 16, 2022 Queued - …
β”œβ”€β”€ f58aa90 May 16, 2022 Queued - …
β”œβ”€β”€ 1be2ffe May 16, 2022 Queued - …
β”œβ”€β”€ b62c559 May 16, 2022 Queued - …
β”œβ”€β”€ 7aa60b9 May 16, 2022 Queued - …
β”œβ”€β”€ 97fb27f May 16, 2022 Queued - …
β”œβ”€β”€ c1f5135 May 16, 2022 Queued - …
β”œβ”€β”€ 6fa4dda May 16, 2022 Queued - …
β”œβ”€β”€ a74abe4 May 16, 2022 Queued - …
β”œβ”€β”€ 949343f May 16, 2022 Queued - …
β”œβ”€β”€ 0b49a7b May 16, 2022 Queued - …
β”œβ”€β”€ cfe8b2c May 16, 2022 Queued - …
β”œβ”€β”€ 2530894 May 16, 2022 Queued - …
β”œβ”€β”€ fd04249 May 16, 2022 Queued - …
β”œβ”€β”€ 4c5a546 May 16, 2022 Queued - …
β”œβ”€β”€ 1aeb3f1 May 16, 2022 Queued - …
β”œβ”€β”€ 294699c May 16, 2022 Queued - …
β”œβ”€β”€ 831a18b May 16, 2022 Queued - …
β”œβ”€β”€ ab811df May 16, 2022 Queued - …
β”œβ”€β”€ 97fd1b5 May 16, 2022 Queued - …
β”œβ”€β”€ b1a714a May 16, 2022 Queued - …
β”œβ”€β”€ c7b2795 May 16, 2022 Queued - …
β”œβ”€β”€ ee90f65 May 16, 2022 Queued - …
β”œβ”€β”€ 9f9584b May 16, 2022 Queued - …
β”œβ”€β”€ 951c4bb May 16, 2022 Queued - …
β”œβ”€β”€ f545d49 May 16, 2022 Queued - …
└── 910dcc0 May 16, 2022 Queued - …

I get the following when I run dvc exp run --run-all

$ dvc exp run --run-all
ERROR: unexpected error                                               

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

Reproduce

It’s difficult to know precisely how to reproduce this, as sometimes it works, and sometimes not, nor could I reproduce on a toy example, but in principal:

  1. dvc init
  2. dvc exp run --queue -S <adjust parameter here>
  3. repeat multiple times
  4. dvc exp run --run-all

Expected

I expected dvc exp to run my experiments, or at least offer a useful error message.

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.10.2 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.15.0-1005-aws-x86_64-with-glibc2.35
Supports:
        hdfs (fsspec = 2022.3.0, pyarrow = 8.0.0),
        webhdfs (fsspec = 2022.3.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        s3 (s3fs = 2022.3.0, boto3 = 1.21.21)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p1
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/nvme0n1p1
Repo: dvc, git

Additional Information (if any): Output of dvc exp run --run-all --verbose

2022-05-17 04:19:46,090 DEBUG: Reproducing experiment revs 'a9a2a52, a9362c3, a412093, ceebf27, e09f285, f58aa90, 1be2ffe, b62c559, 7aa60b9, 97fb27f, c1f5135, 6fa4dda, a74abe4, 949343f, 0b49a7b, cfe8b2c, 2530894, fd04249, 4c5a546, 1aeb3f1, 294699c, 831a18b, ab811df, 97fd1b5, b1a714a, c7b2795, ee90f65, 9f9584b, 951c4bb, f545d49, 910dcc0'
2022-05-17 04:19:46,234 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpio0j1h9t/.dvc/config.local'
2022-05-17 04:19:46,234 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpio0j1h9t'
2022-05-17 04:19:46,347 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmp522qrkpe/.dvc/config.local'
2022-05-17 04:19:46,347 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmp522qrkpe'
2022-05-17 04:19:46,460 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpsqxqe4a1/.dvc/config.local'
2022-05-17 04:19:46,461 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpsqxqe4a1'
2022-05-17 04:19:46,570 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpfcqw2zd4/.dvc/config.local'
2022-05-17 04:19:46,570 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpfcqw2zd4'
2022-05-17 04:19:46,681 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmprg6sp9y9/.dvc/config.local'
2022-05-17 04:19:46,682 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmprg6sp9y9'
2022-05-17 04:19:46,795 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmptikvz81f/.dvc/config.local'
2022-05-17 04:19:46,795 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmptikvz81f'
2022-05-17 04:19:46,907 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpgwal6ofe/.dvc/config.local'
2022-05-17 04:19:46,907 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpgwal6ofe'
2022-05-17 04:19:47,021 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpq2mhffva/.dvc/config.local'
2022-05-17 04:19:47,021 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpq2mhffva'
2022-05-17 04:19:47,133 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpcaqst3h0/.dvc/config.local'
2022-05-17 04:19:47,133 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpcaqst3h0'
2022-05-17 04:19:47,245 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpyqu6eet0/.dvc/config.local'
2022-05-17 04:19:47,245 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpyqu6eet0'
2022-05-17 04:19:47,361 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpt_gclmlq/.dvc/config.local'
2022-05-17 04:19:47,362 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpt_gclmlq'
2022-05-17 04:19:47,477 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpv5idmb5e/.dvc/config.local'
2022-05-17 04:19:47,477 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpv5idmb5e'
2022-05-17 04:19:47,587 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpljs_i66o/.dvc/config.local'
2022-05-17 04:19:47,587 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpljs_i66o'
2022-05-17 04:19:47,699 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmprogu65u0/.dvc/config.local'
2022-05-17 04:19:47,699 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmprogu65u0'
2022-05-17 04:19:47,809 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpmpgofdf4/.dvc/config.local'
2022-05-17 04:19:47,809 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpmpgofdf4'
2022-05-17 04:19:47,920 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmppftuyscl/.dvc/config.local'
2022-05-17 04:19:47,921 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmppftuyscl'
2022-05-17 04:19:48,034 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpdipuv88o/.dvc/config.local'
2022-05-17 04:19:48,034 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpdipuv88o'
2022-05-17 04:19:48,144 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmp2tqp8tnt/.dvc/config.local'
2022-05-17 04:19:48,144 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmp2tqp8tnt'
2022-05-17 04:19:48,255 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpea2ejj86/.dvc/config.local'
2022-05-17 04:19:48,255 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpea2ejj86'
2022-05-17 04:19:48,559 DEBUG: [Errno 95] no more link types left to try out: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>: [Errno 95] Operation not supported
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/cli/__init__.py", line 90, in main
    ret = cmd.do_run()
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/commands/experiments/run.py", line 32, in run
    results = self.repo.experiments.run(
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 825, in run
    return run(self.repo, *args, **kwargs)
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/run.py", line 28, in run
    return repo.experiments.reproduce_queued(jobs=jobs)
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 457, in reproduce_queued
    results = self._reproduce_revs(**kwargs)
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 53, in wrapper
    return f(exp, *args, **kwargs)
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 635, in _reproduce_revs
    manager = manager_cls.from_stash_entries(
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager/base.py", line 119, in from_stash_entries
    manager._enqueue_stash_entries(scm, repo, to_run, **kwargs)
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager/base.py", line 147, in _enqueue_stash_entries
    self.enqueue(stash_rev, executor)
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager/base.py", line 70, in enqueue
    assert rev not in self
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/fs/utils.py", line 28, in _link
    func(from_path, to_path)
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/fs/base.py", line 263, in reflink
    return self.fs.reflink(from_info, to_info)
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/fs/local.py", line 156, in reflink
    return System.reflink(path1, path2)
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/system.py", line 112, in reflink
    System._reflink_linux(source, link_name)
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/system.py", line 96, in _reflink_linux
    fcntl.ioctl(d.fileno(), FICLONE, s.fileno())
OSError: [Errno 95] Operation not supported

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/fs/utils.py", line 69, in _try_links
    return _link(link, from_fs, from_path, to_fs, to_path)
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/fs/utils.py", line 32, in _link
    raise OSError(
OSError: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/fs/utils.py", line 124, in _test_link
    _try_links([link], from_fs, from_file, to_fs, to_file)
  File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/fs/utils.py", line 77, in _try_links
    raise OSError(
OSError: [Errno 95] no more link types left to try out
------------------------------------------------------------
2022-05-17 04:19:48,560 DEBUG: Removing '/home/matt/.V4NdZ3uXiSsszXYSj6WPvF.tmp'
2022-05-17 04:19:48,560 DEBUG: Removing '/home/matt/.V4NdZ3uXiSsszXYSj6WPvF.tmp'
2022-05-17 04:19:48,561 DEBUG: Removing '/home/matt/.V4NdZ3uXiSsszXYSj6WPvF.tmp'
2022-05-17 04:19:48,561 DEBUG: Removing '/home/matt/project/.dvc/cache/.KZ4Zu7TEA7FBRtSDWNpDgQ.tmp'
2022-05-17 04:19:48,564 DEBUG: Version info for developers:
DVC version: 2.10.2 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.15.0-1005-aws-x86_64-with-glibc2.35
Supports:
	hdfs (fsspec = 2022.3.0, pyarrow = 8.0.0),
	webhdfs (fsspec = 2022.3.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	s3 (s3fs = 2022.3.0, boto3 = 1.21.21)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p1
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/nvme0n1p1
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-05-17 04:19:48,566 DEBUG: Analytics is enabled.
2022-05-17 04:19:48,622 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp2cpzut38']'
2022-05-17 04:19:48,624 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp2cpzut38']'

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
ivyleavedtoadflaxcommented, Oct 7, 2022

Hey sorry, I don’t have access to this pipeline anymore, so I cannot repeat!

1reaction
ivyleavedtoadflaxcommented, May 17, 2022

For reference, it can be resolved by removing all experiments, and re-adding a smaller number and executing. It’s also worth noting that this project involves quite a lot of data:

$ du -h -d 1 .dvc
231G    .dvc/cache
17G     .dvc/tmp
247G    .dvc

$ du -h -d 1 data
16G     data/raw
28K     data/processed
16G     data
Read more comments on GitHub >

github_iconTop Results From Across the Web

dvc exp run unavailable in CML Β· Issue #7547 - GitHub
For now when trying to do this using the setup action I get an unexpected error. I understand that commit hides experiment results...
Read more >
exp run | Data Version Control - DVC
Open-source version control system for Data Science and Machine Learning projects. Git-like experience to organize your data, models, and experiments.
Read more >
How do launch experiments in DVC? - Stack Overflow
And then after using command dvc exp run --run-all I get error message: ERROR: 'dvc.yaml' does not exist ERROR: Failed to reproduceΒ ...
Read more >
Learning DVC by trial and error - A Peck of Pickled POJOs
dvc remote list storage s3://ml-ci https3 https://ml-ci.s3.amazonaws.com/ $ dvc push -r storage ERROR: unexpected error - An error occurred ...
Read more >
September '21 Community Gems - Iterative.ai
This month: data registries, working with DVC remotes, queued experiments, ... When you use dvc exp run --queue or dvc exp run --run-all...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found