question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

dvc pull: unexpected error - [Errno 22] Bad Request

See original GitHub issue

Bug Report

dvc pull: unexpected error

Description

I have several dvc resources imported into a project… The tracking files (.dvc) are committed to the repository utilizing these resources. When attempting to pull (dvc pull), the associated tracked resources - am getting an error:

An error occurred (400) when calling the HeadObject operation: Bad Request (relevant log from dvc pull —v below).

Reproduce

Example:

  1. dvc import resource into project
  2. latter or from a fresh checkout of the above git repo and then attempt to dvc pull

Expected

Expecting tracked resource to be retrieved from DVC. I am able to perform a dvc update <resource> and that will pull the dvc resource into the folder structure, of course this causes the .dvc file to show as modified (even though it points to the same location/revision).

How to proceed here? something corrupted?

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 2.8.3 (pip)
---------------------------------
Platform: Python 3.8.0 on Linux-3.10.0-1160.45.1.el7.x86_64-x86_64-with-glibc2.27
Supports:
	webhdfs (fsspec = 2021.11.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	s3 (s3fs = 2021.11.0, boto3 = 1.17.106)
Cache types: hardlink, symlink
Cache directory: nfs on LEB1MLNAS.company.com:/leb1mlnas_projects
Caches: local
Remotes: None
Workspace directory: nfs on LEB1MLNAS.company.com:/leb1mlnas_projects
Repo: dvc, git
$ dvc pull -vvv

CUT......

2021-11-23 09:03:02,390 TRACE: Assuming '/projects/shared_dvc_cache/d6/9a712149e586998fb73c9566bd7e9f' is unchanged since it is read-only                                                                              
2021-11-23 09:03:02,394 DEBUG: Preparing to transfer data from 's3://dvc-inspection-ai/bit_tool_artifacts/RepoA' to '../../../../../../../shared_dvc_cache'                                                                                       
2021-11-23 09:03:02,394 DEBUG: Preparing to collect status from '../../../../../../../shared_dvc_cache'
2021-11-23 09:03:02,394 DEBUG: Collecting status from '../../../../../../../shared_dvc_cache'
2021-11-23 09:03:02,395 DEBUG: Preparing to collect status from 's3://dvc-inspection-ai/bit_tool_artifacts/RepoA'                                                                                                                                 
2021-11-23 09:03:02,396 DEBUG: Collecting status from 's3://dvc-inspection-ai/bit_tool_artifacts/RepoA'
2021-11-23 09:03:02,396 DEBUG: Querying 1 hashes via object_exists
2021-11-23 09:03:02,510 ERROR: unexpected error - [Errno 22] Bad Request: An error occurred (400) when calling the HeadObject operation: Bad Request                                                                                             
------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/s3fs/core.py", line 250, in _call_s3
    out = await method(**additional_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/aiobotocore/client.py", line 155, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (400) when calling the HeadObject operation: Bad Request

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/usr/local/lib/python3.8/dist-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/usr/local/lib/python3.8/dist-packages/dvc/command/data_sync.py", line 30, in run
    stats = self.repo.pull(
  File "/usr/local/lib/python3.8/dist-packages/dvc/repo/__init__.py", line 50, in wrapper
    return f(repo, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dvc/repo/pull.py", line 29, in pull
    processed_files_count = self.fetch(
  File "/usr/local/lib/python3.8/dist-packages/dvc/repo/__init__.py", line 50, in wrapper
    return f(repo, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dvc/repo/fetch.py", line 67, in fetch
    d, f = _fetch(
  File "/usr/local/lib/python3.8/dist-packages/dvc/repo/fetch.py", line 87, in _fetch
    downloaded += repo.cloud.pull(obj_ids, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dvc/data_cloud.py", line 114, in pull
    return transfer(
  File "/usr/local/lib/python3.8/dist-packages/dvc/objects/transfer.py", line 153, in transfer
    status = compare_status(src, dest, obj_ids, check_deleted=False, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dvc/objects/status.py", line 166, in compare_status
    src_exists, src_missing = status(
  File "/usr/local/lib/python3.8/dist-packages/dvc/objects/status.py", line 132, in status
    odb.hashes_exist(hashes, name=str(odb.path_info), **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dvc/objects/db/base.py", line 468, in hashes_exist
    remote_hashes = self.list_hashes_exists(hashes, jobs, name)
  File "/usr/local/lib/python3.8/dist-packages/dvc/objects/db/base.py", line 419, in list_hashes_exists
    ret = list(itertools.compress(hashes, in_remote))
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.8/dist-packages/dvc/objects/db/base.py", line 410, in exists_with_progress
    ret = self.fs.exists(path_info)
  File "/usr/local/lib/python3.8/dist-packages/dvc/fs/fsspec_wrapper.py", line 136, in exists
    return self.fs.exists(self._with_bucket(path_info))
  File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/usr/local/lib/python3.8/dist-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/usr/local/lib/python3.8/dist-packages/s3fs/core.py", line 822, in _exists
    await self._info(path, bucket, key, version_id=version_id)
  File "/usr/local/lib/python3.8/dist-packages/s3fs/core.py", line 1016, in _info
    out = await self._call_s3(
  File "/usr/local/lib/python3.8/dist-packages/s3fs/core.py", line 270, in _call_s3
    raise err
OSError: [Errno 22] Bad Request
------------------------------------------------------------
2021-11-23 09:03:02,825 DEBUG: Version info for developers:
DVC version: 2.8.3 (pip)
---------------------------------
Platform: Python 3.8.0 on Linux-3.10.0-1160.45.1.el7.x86_64-x86_64-with-glibc2.27
Supports:
	webhdfs (fsspec = 2021.11.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
	s3 (s3fs = 2021.11.0, boto3 = 1.17.106)
Cache types: hardlink, symlink
Cache directory: nfs on LEB1MLNAS.company.com:/leb1mlnas_projects
Caches: local
Remotes: None
Workspace directory: nfs on LEB1MLNAS.company.com:/leb1mlnas_projects
Repo: dvc, git

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:17 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
wdixoncommented, Nov 24, 2021

@efiop Oh, I see. Yeah, that’s definitely a clue! So in the repo you are running dvc pull in, you only have dvc imported artifacts?

Yes - that is correct… In this particular cases - all of the individually imported dvc artifacts originate in the same git artifacts repository.

@efiop Are you able to clone those artifact’s repo and run dvc pull there?

Yes - that work just fine… I just performed

  1. git clone <artifacts_repo>
  2. dvc pull from within the artifacts repo.
1reaction
wdixoncommented, Nov 24, 2021

Sorry I had meant to say

dvc pull --jobs 1

produced the same error that I originally posted….

I am in a Linux vm - only 5 cpus - the minio endpoint is a reasonable size - and I am the only one currently interacting with it…. (Doubt it’s a capacity issue)

I get the same error on a Windows client 12core laptop… a colleague of mine also received the same error attempting access to the same repo

The ‘none’ - I am guessing might be the git project doesn’t have its own dvc repo configured (remote), rather it only contains dvc imported resources…. Is that a clue?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting | Data Version Control - DVC
A known problem some users run into with the dvc pull , dvc fetch and dvc push commands is [Errno 24] Too many...
Read more >
How to fix DVC error 'FileNotFoundError: [Errno 2] No such file ...
Trying to pull a folder with test data into a GitHub actions container, I get. FileNotFoundError: [Errno 2] No such file or directory....
Read more >
Untitled
Rojo desde el amanecer acordes, Hmv dvd box set breaking bad, Sk504 motion ... Creative bc films, Savitri serial 6th jan 2016, Maglia...
Read more >
shcheklein/example-get-started: Get started DVC project
$ dvc repro Data and pipelines are up to date. If you'd like to test commands like dvc push , that require write...
Read more >
Package List — Spack 0.20.0.dev0 documentation
AMD also provides highly optimized libraries, which extract the optimal performance from each x86 processor core when utilized. The AOCC Compiler Suite ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found