question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

gc: fails when attempting to remove cache shared by multiple projects

See original GitHub issue

Bug Report

Issue name

gc: fails when attempting to remove cache shared by multiple projects

Description

When attempting to garbage collect files shared by multiple projects dvc throws an error saying it is attempting to write a read only file.

Reproduce

I don’t have multiple dvc repos to reproduce on

Expected

dvc performs gc as normal

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.9.5 (pip)
---------------------------------
Platform: Python 3.8.1 on Linux-5.17.5-76051705-generic-x86_64-with-glibc2.10
Supports:
	azure (adlfs = 2022.2.0, knack = 0.9.0, azure-identity = 1.8.0),
	webhdfs (fsspec = 2022.2.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	s3 (s3fs = 2022.2.0, boto3 = 1.20.24)
Cache types: reflink, hardlink, symlink
Cache directory: xfs on /dev/mapper/fastdata-fastlv
Caches: local
Remotes: local, s3, local
Workspace directory: xfs on /dev/mapper/fastdata-fastlv
Repo: dvc, git

Additional Information (if any):

$ dvc gc -v -w -p . ../../dcdanko/bdx2 ../../papciak/Biotia-DX/ ../../tpaisie/bdx/ ../../ahmadazim/Biotia-DX/ ../../hwells/Biotia-DX/
2022-08-03 11:31:02,132 WARNING: This will remove all cache except items used in the workspace of the current and the following repos:
  - /mnt/fast/dev/dcdanko/bdx1
  - /mnt/fast/dev/dcdanko/bdx2
  - /mnt/fast/dev/papciak/Biotia-DX
  - /mnt/fast/dev/tpaisie/bdx
  - /mnt/fast/dev/ahmadazim/Biotia-DX
  - /mnt/fast/dev/hwells/Biotia-DX
Are you sure you want to proceed? [y/n]: y
2022-08-03 11:31:04,244 ERROR: unexpected error - attempt to write a readonly database
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/cli/__init__.py", line 78, in main
    ret = cmd.do_run()
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/commands/gc.py", line 51, in run
    self.repo.gc(
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/gc.py", line 53, in gc
    all_repos = [Repo(path) for path in repos]
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/gc.py", line 53, in <listcomp>
    all_repos = [Repo(path) for path in repos]
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/repo/__init__.py", line 202, in __init__
    self.state = State(self.root_dir, state_db_dir, self.dvcignore)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/dvc/state.py", line 65, in __init__
    self.links = Cache(directory=os.path.join(tmp_dir, "links"), **config)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/diskcache/core.py", line 478, in __init__
    self.reset(key, value, update=False)
  File "/home/dcdanko/miniconda/envs/bdx1/lib/python3.8/site-packages/diskcache/core.py", line 2433, in reset
    ((old_value,),) = sql(
sqlite3.OperationalError: attempt to write a readonly database
------------------------------------------------------------
2022-08-03 11:31:05,655 DEBUG: Removing '/mnt/fast/dev/dcdanko/.RKoWhSFAMKbZQvEyT5Twwi.tmp'
2022-08-03 11:31:05,656 DEBUG: Removing '/mnt/fast/dev/dcdanko/.RKoWhSFAMKbZQvEyT5Twwi.tmp'
2022-08-03 11:31:05,656 DEBUG: Removing '/mnt/fast/dev/dcdanko/.RKoWhSFAMKbZQvEyT5Twwi.tmp'
2022-08-03 11:31:05,656 DEBUG: Removing '/fast/bdx/.shared_dvc_cache/.6xypQvximg96enbwqfa4tN.tmp'
2022-08-03 11:31:05,674 DEBUG: Version info for developers:
DVC version: 2.9.5 (pip)
---------------------------------
Platform: Python 3.8.1 on Linux-5.17.5-76051705-generic-x86_64-with-glibc2.10
Supports:
	azure (adlfs = 2022.2.0, knack = 0.9.0, azure-identity = 1.8.0),
	webhdfs (fsspec = 2022.2.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	s3 (s3fs = 2022.2.0, boto3 = 1.20.24)
Cache types: reflink, hardlink, symlink
Cache directory: xfs on /dev/mapper/fastdata-fastlv
Caches: local
Remotes: local, s3, local
Workspace directory: xfs on /dev/mapper/fastdata-fastlv
Repo: dvc, git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-08-03 11:31:05,676 DEBUG: Analytics is disabled.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:10 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
iddqdiddqdcommented, Aug 19, 2022

@dcdanko thank you! I’ve followed the steps from the doc you shared and have set up a separate caching directory.

On top of that, it was required to adjust permissions for GID inheritance (chmod u=rwx,g=rwx,o=,g+s ~/dvc-cache/) and use dvc config cache.type copy so that the files can be editable within my setup. My issue is resolved now.

UPD: I am sorry, meant to tag @daavoo

1reaction
dcdankocommented, Aug 17, 2022

Thanks, I just tried with dvc 2.18.1 and the error persists

Read more comments on GitHub >

github_iconTop Results From Across the Web

DVC gc issues with shared cache and remote for several repos
I'm using DVC for two projects sharing the same remote and cache. I would like to clean the remote and the cache in...
Read more >
How can I make one single `.gradle` cache for multiple projects?
Because of the locking mechanism Gradle uses for its dependency cache, you can't have multiple instances write to the same cache directory.
Read more >
Caching Dependencies - CircleCI
This document is a guide to caching dependencies in CircleCI pipelines. ... Each cache key is namespaced to the project and retrieval is...
Read more >
Build Cache - Gradle User Manual
When using a shared build cache for task output caching this even works across ... Gradle will try to reuse outputs from previous...
Read more >
Cache Implementations in C# .NET | Michael's Coding Spot
High memory consumption can lead to GC Pressure (aka Memory Pressure). ... FromSeconds(2)) // Remove from cache after this time, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found