`dvc exp run --run-all` results in `ERROR: unexpected error`
See original GitHub issueBug Report
Description
After queuing up a number of experiments that I can see with dvc exp show
:
Experiment | Created | State | eval_loss | β¦ |
---|---|---|---|---|
workspace | - | - | 0.051839 | β¦ |
longdocs2 | May 16, 2022 | - | 0.051839 | β¦ |
βββ a9a2a52 | May 16, 2022 | Queued | - | β¦ |
βββ a9362c3 | May 16, 2022 | Queued | - | β¦ |
βββ a412093 | May 16, 2022 | Queued | - | β¦ |
βββ ceebf27 | May 16, 2022 | Queued | - | β¦ |
βββ e09f285 | May 16, 2022 | Queued | - | β¦ |
βββ f58aa90 | May 16, 2022 | Queued | - | β¦ |
βββ 1be2ffe | May 16, 2022 | Queued | - | β¦ |
βββ b62c559 | May 16, 2022 | Queued | - | β¦ |
βββ 7aa60b9 | May 16, 2022 | Queued | - | β¦ |
βββ 97fb27f | May 16, 2022 | Queued | - | β¦ |
βββ c1f5135 | May 16, 2022 | Queued | - | β¦ |
βββ 6fa4dda | May 16, 2022 | Queued | - | β¦ |
βββ a74abe4 | May 16, 2022 | Queued | - | β¦ |
βββ 949343f | May 16, 2022 | Queued | - | β¦ |
βββ 0b49a7b | May 16, 2022 | Queued | - | β¦ |
βββ cfe8b2c | May 16, 2022 | Queued | - | β¦ |
βββ 2530894 | May 16, 2022 | Queued | - | β¦ |
βββ fd04249 | May 16, 2022 | Queued | - | β¦ |
βββ 4c5a546 | May 16, 2022 | Queued | - | β¦ |
βββ 1aeb3f1 | May 16, 2022 | Queued | - | β¦ |
βββ 294699c | May 16, 2022 | Queued | - | β¦ |
βββ 831a18b | May 16, 2022 | Queued | - | β¦ |
βββ ab811df | May 16, 2022 | Queued | - | β¦ |
βββ 97fd1b5 | May 16, 2022 | Queued | - | β¦ |
βββ b1a714a | May 16, 2022 | Queued | - | β¦ |
βββ c7b2795 | May 16, 2022 | Queued | - | β¦ |
βββ ee90f65 | May 16, 2022 | Queued | - | β¦ |
βββ 9f9584b | May 16, 2022 | Queued | - | β¦ |
βββ 951c4bb | May 16, 2022 | Queued | - | β¦ |
βββ f545d49 | May 16, 2022 | Queued | - | β¦ |
βββ 910dcc0 | May 16, 2022 | Queued | - | β¦ |
I get the following when I run dvc exp run --run-all
$ dvc exp run --run-all
ERROR: unexpected error
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
Reproduce
Itβs difficult to know precisely how to reproduce this, as sometimes it works, and sometimes not, nor could I reproduce on a toy example, but in principal:
- dvc init
- dvc exp run --queue -S <adjust parameter here>
- repeat multiple times
- dvc exp run --run-all
Expected
I expected dvc exp
to run my experiments, or at least offer a useful error message.
Environment information
Output of dvc doctor
:
$ dvc doctor
DVC version: 2.10.2 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.15.0-1005-aws-x86_64-with-glibc2.35
Supports:
hdfs (fsspec = 2022.3.0, pyarrow = 8.0.0),
webhdfs (fsspec = 2022.3.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
s3 (s3fs = 2022.3.0, boto3 = 1.21.21)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p1
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/nvme0n1p1
Repo: dvc, git
Additional Information (if any):
Output of dvc exp run --run-all --verbose
2022-05-17 04:19:46,090 DEBUG: Reproducing experiment revs 'a9a2a52, a9362c3, a412093, ceebf27, e09f285, f58aa90, 1be2ffe, b62c559, 7aa60b9, 97fb27f, c1f5135, 6fa4dda, a74abe4, 949343f, 0b49a7b, cfe8b2c, 2530894, fd04249, 4c5a546, 1aeb3f1, 294699c, 831a18b, ab811df, 97fd1b5, b1a714a, c7b2795, ee90f65, 9f9584b, 951c4bb, f545d49, 910dcc0'
2022-05-17 04:19:46,234 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpio0j1h9t/.dvc/config.local'
2022-05-17 04:19:46,234 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpio0j1h9t'
2022-05-17 04:19:46,347 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmp522qrkpe/.dvc/config.local'
2022-05-17 04:19:46,347 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmp522qrkpe'
2022-05-17 04:19:46,460 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpsqxqe4a1/.dvc/config.local'
2022-05-17 04:19:46,461 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpsqxqe4a1'
2022-05-17 04:19:46,570 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpfcqw2zd4/.dvc/config.local'
2022-05-17 04:19:46,570 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpfcqw2zd4'
2022-05-17 04:19:46,681 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmprg6sp9y9/.dvc/config.local'
2022-05-17 04:19:46,682 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmprg6sp9y9'
2022-05-17 04:19:46,795 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmptikvz81f/.dvc/config.local'
2022-05-17 04:19:46,795 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmptikvz81f'
2022-05-17 04:19:46,907 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpgwal6ofe/.dvc/config.local'
2022-05-17 04:19:46,907 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpgwal6ofe'
2022-05-17 04:19:47,021 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpq2mhffva/.dvc/config.local'
2022-05-17 04:19:47,021 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpq2mhffva'
2022-05-17 04:19:47,133 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpcaqst3h0/.dvc/config.local'
2022-05-17 04:19:47,133 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpcaqst3h0'
2022-05-17 04:19:47,245 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpyqu6eet0/.dvc/config.local'
2022-05-17 04:19:47,245 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpyqu6eet0'
2022-05-17 04:19:47,361 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpt_gclmlq/.dvc/config.local'
2022-05-17 04:19:47,362 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpt_gclmlq'
2022-05-17 04:19:47,477 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpv5idmb5e/.dvc/config.local'
2022-05-17 04:19:47,477 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpv5idmb5e'
2022-05-17 04:19:47,587 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpljs_i66o/.dvc/config.local'
2022-05-17 04:19:47,587 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpljs_i66o'
2022-05-17 04:19:47,699 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmprogu65u0/.dvc/config.local'
2022-05-17 04:19:47,699 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmprogu65u0'
2022-05-17 04:19:47,809 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpmpgofdf4/.dvc/config.local'
2022-05-17 04:19:47,809 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpmpgofdf4'
2022-05-17 04:19:47,920 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmppftuyscl/.dvc/config.local'
2022-05-17 04:19:47,921 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmppftuyscl'
2022-05-17 04:19:48,034 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpdipuv88o/.dvc/config.local'
2022-05-17 04:19:48,034 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpdipuv88o'
2022-05-17 04:19:48,144 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmp2tqp8tnt/.dvc/config.local'
2022-05-17 04:19:48,144 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmp2tqp8tnt'
2022-05-17 04:19:48,255 DEBUG: Writing experiments local config '/home/matt/project/.dvc/tmp/exps/tmpea2ejj86/.dvc/config.local'
2022-05-17 04:19:48,255 DEBUG: Init temp dir executor in '/home/matt/project/.dvc/tmp/exps/tmpea2ejj86'
2022-05-17 04:19:48,559 DEBUG: [Errno 95] no more link types left to try out: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>: [Errno 95] Operation not supported
------------------------------------------------------------
Traceback (most recent call last):
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/cli/__init__.py", line 90, in main
ret = cmd.do_run()
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/cli/command.py", line 22, in do_run
return self.run()
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/commands/experiments/run.py", line 32, in run
results = self.repo.experiments.run(
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 825, in run
return run(self.repo, *args, **kwargs)
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/__init__.py", line 48, in wrapper
return f(repo, *args, **kwargs)
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/run.py", line 28, in run
return repo.experiments.reproduce_queued(jobs=jobs)
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 457, in reproduce_queued
results = self._reproduce_revs(**kwargs)
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 53, in wrapper
return f(exp, *args, **kwargs)
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/__init__.py", line 635, in _reproduce_revs
manager = manager_cls.from_stash_entries(
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager/base.py", line 119, in from_stash_entries
manager._enqueue_stash_entries(scm, repo, to_run, **kwargs)
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager/base.py", line 147, in _enqueue_stash_entries
self.enqueue(stash_rev, executor)
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/repo/experiments/executor/manager/base.py", line 70, in enqueue
assert rev not in self
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/fs/utils.py", line 28, in _link
func(from_path, to_path)
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/fs/base.py", line 263, in reflink
return self.fs.reflink(from_info, to_info)
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/fs/local.py", line 156, in reflink
return System.reflink(path1, path2)
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/system.py", line 112, in reflink
System._reflink_linux(source, link_name)
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/system.py", line 96, in _reflink_linux
fcntl.ioctl(d.fileno(), FICLONE, s.fileno())
OSError: [Errno 95] Operation not supported
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/fs/utils.py", line 69, in _try_links
return _link(link, from_fs, from_path, to_fs, to_path)
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/fs/utils.py", line 32, in _link
raise OSError(
OSError: [Errno 95] 'reflink' is not supported by <class 'dvc.fs.local.LocalFileSystem'>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/fs/utils.py", line 124, in _test_link
_try_links([link], from_fs, from_file, to_fs, to_file)
File "/home/matt/project/.env/lib/python3.8/site-packages/dvc/fs/utils.py", line 77, in _try_links
raise OSError(
OSError: [Errno 95] no more link types left to try out
------------------------------------------------------------
2022-05-17 04:19:48,560 DEBUG: Removing '/home/matt/.V4NdZ3uXiSsszXYSj6WPvF.tmp'
2022-05-17 04:19:48,560 DEBUG: Removing '/home/matt/.V4NdZ3uXiSsszXYSj6WPvF.tmp'
2022-05-17 04:19:48,561 DEBUG: Removing '/home/matt/.V4NdZ3uXiSsszXYSj6WPvF.tmp'
2022-05-17 04:19:48,561 DEBUG: Removing '/home/matt/project/.dvc/cache/.KZ4Zu7TEA7FBRtSDWNpDgQ.tmp'
2022-05-17 04:19:48,564 DEBUG: Version info for developers:
DVC version: 2.10.2 (pip)
---------------------------------
Platform: Python 3.8.10 on Linux-5.15.0-1005-aws-x86_64-with-glibc2.35
Supports:
hdfs (fsspec = 2022.3.0, pyarrow = 8.0.0),
webhdfs (fsspec = 2022.3.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
s3 (s3fs = 2022.3.0, boto3 = 1.21.21)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p1
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/nvme0n1p1
Repo: dvc, git
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-05-17 04:19:48,566 DEBUG: Analytics is enabled.
2022-05-17 04:19:48,622 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp2cpzut38']'
2022-05-17 04:19:48,624 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp2cpzut38']'
Issue Analytics
- State:
- Created a year ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
dvc exp run unavailable in CML Β· Issue #7547 - GitHub
For now when trying to do this using the setup action I get an unexpected error. I understand that commit hides experiment results...
Read more >exp run | Data Version Control - DVC
Open-source version control system for Data Science and Machine Learning projects. Git-like experience to organize your data, models, and experiments.
Read more >How do launch experiments in DVC? - Stack Overflow
And then after using command dvc exp run --run-all I get error message: ERROR: 'dvc.yaml' does not exist ERROR: Failed to reproduceΒ ...
Read more >Learning DVC by trial and error - A Peck of Pickled POJOs
dvc remote list storage s3://ml-ci https3 https://ml-ci.s3.amazonaws.com/ $ dvc push -r storage ERROR: unexpected error - An error occurred ...
Read more >September '21 Community Gems - Iterative.ai
This month: data registries, working with DVC remotes, queued experiments, ... When you use dvc exp run --queue or dvc exp run --run-all...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hey sorry, I donβt have access to this pipeline anymore, so I cannot repeat!
For reference, it can be resolved by removing all experiments, and re-adding a smaller number and executing. Itβs also worth noting that this project involves quite a lot of data: