Dvc is not able to pull files from a public Backblaze S3 remote
See original GitHub issueBug Report
Description
Set up a public Backblaze remote and push committed files in dvc.
Clone the git repository with only public access informations to Backblaze (no secret_api_key), try a dvc pull that fails.
Reproduce
- git init
- dvc init
- Configure Backblaze remote:
# .dvc/config
[core]
remote = b2
['remote "b2"']
url = s3://<BUCKET>/
endpointurl = https://s3.us-west-000.backblazeb2.com
# .dvc/config.local
['remote "b2"']
access_key_id = <ACCESS_KEY>
secret_access_key = <SECRET_KEY>
Set B2 bucket as public.
- copy
file.txtin the repository - dvc add file.txt
- git add .
- git commit -m “Initial”
- dvc push
- Now go to another directory (cd /tmp)
- git clone original repository (only public B2 informations are copied)
- dvc pull
- Exception occurs:
ERROR: failed to pull data from the cloud - Unable to find AWS credentials. <https://error.dvc.org/no-credentials>: Unable to locate credentials
Expected
dvc pull retrieves the committed file without problems
Environment information
Output of dvc doctor:
$ dvc doctor
DVC version: 2.1.0 (pip)
---------------------------------
Platform: Python 3.9.5 on Linux-5.12.7-200.fc33.x86_64-x86_64-with-glibc2.32
Supports: gdrive, http, https, s3, ssh
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: s3
Workspace directory: tmpfs on tmpfs
Repo: dvc, git
Additional Information (if any): As reported in Backblaze docs files listing always requires authorization.
Access controls are simple. Uploads into a bucket always require authorization. Listing files in a bucket always requires authorization, and deleting files always requires authorization. For downloading files, though, you have the option of requiring authorization, or making all of the files in a bucket the files visible to the public.
Maybe this is the cause of the error, since it differs from AWS S3 default.
On the other hand inside dvc.yaml the md5 of the outputs are explicitly written, so there should be a need
$ dvc pull --debug
2021-06-22 11:39:43,383 DEBUG: Preparing to download data from 's3://pl-experiments/'
2021-06-22 11:39:43,383 DEBUG: Preparing to collect status from s3://pl-experiments/
2021-06-22 11:39:43,384 DEBUG: Collecting information from local cache...
2021-06-22 11:39:43,387 DEBUG: Collecting information from remote cache...
2021-06-22 11:39:43,388 DEBUG: Matched '0' indexed hashes
Everything is up to date.
2021-06-22 11:39:45,639 ERROR: failed to pull data from the cloud - Unable to find AWS credentials. <https://error.dvc.org/no-credentials>: Unable to locate credentials
------------------------------------------------------------
Traceback (most recent call last):
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/fs/s3.py", line 155, in _get_s3
yield self.s3
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/fs/s3.py", line 172, in _get_bucket
yield s3.Bucket(bucket)
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/fs/s3.py", line 284, in _list_paths
for obj_summary in obj_summaries:
File "/home/trenta3/.local/lib/python3.9/site-packages/boto3/resources/collection.py", line 83, in __iter__
for page in self.pages():
File "/home/trenta3/.local/lib/python3.9/site-packages/boto3/resources/collection.py", line 166, in pages
for page in pages:
File "/home/trenta3/.local/lib/python3.9/site-packages/botocore/paginate.py", line 255, in __iter__
response = self._make_request(current_kwargs)
File "/home/trenta3/.local/lib/python3.9/site-packages/botocore/paginate.py", line 332, in _make_request
return self._method(**current_kwargs)
File "/home/trenta3/.local/lib/python3.9/site-packages/botocore/client.py", line 386, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/trenta3/.local/lib/python3.9/site-packages/botocore/client.py", line 691, in _make_api_call
http, parsed_response = self._make_request(
File "/home/trenta3/.local/lib/python3.9/site-packages/botocore/client.py", line 711, in _make_request
return self._endpoint.make_request(operation_model, request_dict)
File "/home/trenta3/.local/lib/python3.9/site-packages/botocore/endpoint.py", line 102, in make_request
return self._send_request(request_dict, operation_model)
File "/home/trenta3/.local/lib/python3.9/site-packages/botocore/endpoint.py", line 132, in _send_request
request = self.create_request(request_dict, operation_model)
File "/home/trenta3/.local/lib/python3.9/site-packages/botocore/endpoint.py", line 115, in create_request
self._event_emitter.emit(event_name, request=request,
File "/home/trenta3/.local/lib/python3.9/site-packages/botocore/hooks.py", line 356, in emit
return self._emitter.emit(aliased_event_name, **kwargs)
File "/home/trenta3/.local/lib/python3.9/site-packages/botocore/hooks.py", line 228, in emit
return self._emit(event_name, kwargs)
File "/home/trenta3/.local/lib/python3.9/site-packages/botocore/hooks.py", line 211, in _emit
response = handler(**kwargs)
File "/home/trenta3/.local/lib/python3.9/site-packages/botocore/signers.py", line 90, in handler
return self.sign(operation_name, request)
File "/home/trenta3/.local/lib/python3.9/site-packages/botocore/signers.py", line 162, in sign
auth.add_auth(request)
File "/home/trenta3/.local/lib/python3.9/site-packages/botocore/auth.py", line 373, in add_auth
raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/command/data_sync.py", line 29, in run
stats = self.repo.pull(
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/repo/__init__.py", line 49, in wrapper
return f(repo, *args, **kwargs)
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/repo/pull.py", line 29, in pull
processed_files_count = self.fetch(
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/repo/__init__.py", line 49, in wrapper
return f(repo, *args, **kwargs)
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/repo/fetch.py", line 62, in fetch
downloaded += self.cloud.pull(
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/data_cloud.py", line 88, in pull
return remote.pull(
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/remote/base.py", line 56, in wrapper
return f(obj, *args, **kwargs)
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/remote/base.py", line 486, in pull
ret = self._process(
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/remote/base.py", line 323, in _process
dir_status, file_status, dir_contents = self._status(
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/remote/base.py", line 175, in _status
self.hashes_exist(
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/remote/base.py", line 132, in hashes_exist
return indexed_hashes + self.odb.hashes_exist(list(hashes), **kwargs)
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/objects/db/base.py", line 408, in hashes_exist
remote_size, remote_hashes = self._estimate_remote_size(hashes, name)
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/objects/db/base.py", line 230, in _estimate_remote_size
remote_hashes = set(hashes)
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/objects/db/base.py", line 184, in _hashes_with_limit
for hash_ in self.list_hashes(prefix, progress_callback):
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/objects/db/base.py", line 174, in list_hashes
for path in self._list_paths(prefix, progress_callback):
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/objects/db/base.py", line 154, in _list_paths
for file_info in self.fs.walk_files(path_info, prefix=prefix):
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/fs/s3.py", line 290, in walk_files
for fname in self._list_paths(path_info, **kwargs):
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/fs/s3.py", line 285, in _list_paths
yield obj_summary.key
File "/usr/lib64/python3.9/contextlib.py", line 135, in __exit__
self.gen.throw(type, value, traceback)
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/fs/s3.py", line 175, in _get_bucket
raise DvcException(
File "/usr/lib64/python3.9/contextlib.py", line 135, in __exit__
self.gen.throw(type, value, traceback)
File "/home/trenta3/.local/lib/python3.9/site-packages/dvc/fs/s3.py", line 158, in _get_s3
raise DvcException(
dvc.exceptions.DvcException: Unable to find AWS credentials. <https://error.dvc.org/no-credentials>
------------------------------------------------------------
2021-06-22 11:39:45,658 DEBUG: Analytics is enabled.
2021-06-22 11:39:45,794 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpb1sj_5dk']'
2021-06-22 11:39:45,797 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpb1sj_5dk']'
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (4 by maintainers)
Top Results From Across the Web
BackBlaze B2 permission error #7372 - iterative/dvc - GitHub
I am getting an error when running dvc push command in python to push to a BackBlaze B2 (S3 compatible) remote.
Read more >Troubleshooting | Data Version Control - DVC
Failed to pull data from the cloud · Too many open files error · Unable to find credentials · Unable to connect ·...
Read more >DVC and Backblaze B2 for Reliable & Reproducible Data ...
First we add a new remote to DVC. The -d flag sets this as the default (so that when we push it will...
Read more >AWS S3 Compatible API - Dramatically Lower Your Costs
See how the Backblaze B2 S3 compatible API can dramatically lower your cloud storage costs. Click to learn more.
Read more >Avoid Git LFS if possible - Hacker News
Git LFS has the advantage of not pulling all versions of a large file, too. Instead, it only pulls the version it's checking...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Would it possible to use HTTP remote for read. S3 can provide an HTTP endpoint for a bucket. Does Backblaze have something like this?
We use this setup in the example-get-started repo. When we push we use S3 remote (e.g. with -r flag), but default one is set to HTTP which redirects to the endpoint that S3 provides.
In case on HTTP we don’t need any special permissions I think, but it can be slower in certain scenarios.
@jdonzallaz Ok found our conversation: So my comment was regarding https://github.com/iterative/dvc/issues/5797 which is still work in progress and might help in this use case.