obj/s3: directory checksum for s3 fails with NotImplementedError
See original GitHub issueReported from multiple users when adding/importing a directory on s3, the issue occurs in both import-url and add --external. Importing an individual file and get-url work normally as expected.
(reported against both 2.8.2 - s3 (s3fs = 2021.10.1, boto3 = 1.19.7) and 2.8.3 - s3 (s3fs = 2021.11.0, boto3 = 1.17.106)
dvc import-url --file data/raw.dvc s3://test/sample data/raw -v
2021-11-21 22:10:16,748 DEBUG: Lockfile 'dvc.lock' needs to be updated.
2021-11-21 22:10:16,834 DEBUG: Removing output 'data/raw/sample' of stage: 'data/raw.dvc'.
2021-11-21 22:10:16,834 DEBUG: Removing 'data/raw/sample'
Importing 's3://test/sample' -> 'data/raw/sample'
2021-11-21 22:10:16,839 DEBUG: Computed stage: 'data/raw.dvc' md5: '782d7c58160f763093fa1761aaea4bc5'
2021-11-21 22:10:16,839 DEBUG: 'md5' of stage: 'data/raw.dvc' changed.
2021-11-21 22:10:19,085 ERROR: unexpected error
------------------------------------------------------------
Traceback (most recent call last):
...
_, self.meta, obj = ostage(
File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 296, in stage
meta, obj = _stage_tree(
File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 170, in _stage_tree
meta, tree = _build_tree(path_info, fs, name, odb=odb, **kwargs)
File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 138, in _build_tree
for file_info, meta, obj in _iter_objects(path_info, fs, name, **kwargs):
File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 130, in _iter_objects
yield from _build_objects(path_info, fs, name, **kwargs)
File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 126, in _build_objects
yield from executor.map(worker, walk_iterator)
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
yield fs.pop().result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
raise self._exception
File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/progress.py", line 133, in wrapped
res = fn(*args, **kwargs)
File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 83, in _stage_file
meta, hash_info = get_file_hash(path_info, fs, name, state=state)
File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 72, in get_file_hash
meta, hash_info = _get_file_hash(path_info, fs, name)
File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 57, in _get_file_hash
raise NotImplementedError
NotImplementedError
add --external:
------------------------------------------------------------
Traceback (most recent call last):
File "/opt/conda/envs/dvc/lib/python3.9/site-packages/dvc/main.py", line 55, in main
ret = cmd.do_run()
File "/opt/conda/envs/dvc/lib/python3.9/site-packages/dvc/command/base.py", line 45, in do_run
return self.run()
File "/opt/conda/envs/dvc/lib/python3.9/site-packages/dvc/command/add.py", line 21, in run
self.repo.add(
File "/opt/conda/envs/dvc/lib/python3.9/site-packages/dvc/utils/collections.py", line 163, in inner
result = func(*ba.args, **ba.kwargs)
...
File "/opt/conda/envs/dvc/lib/python3.9/site-packages/dvc/objects/stage.py", line 102, in _stage_file
meta, hash_info = get_file_hash(path_info, fs, name, state=state)
File "/opt/conda/envs/dvc/lib/python3.9/site-packages/dvc/objects/stage.py", line 90, in get_file_hash
meta, hash_info = _get_file_hash(path_info, fs, name)
File "/opt/conda/envs/dvc/lib/python3.9/site-packages/dvc/objects/stage.py", line 71, in _get_file_hash
raise NotImplementedError
NotImplementedError
discord context (with full tracebacks):
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:7 (3 by maintainers)
Top Results From Across the Web
Checking object integrity - Amazon Simple Storage Service
Amazon S3 uses checksum values to verify the integrity of data that you upload to ... If the two checksum values don't match,...
Read more >load not defined for Aws::S3::ObjectSummary · Issue #1172 ...
For example, I can see that a call like s3.resource.bucket(b).objects.first will return an ObjectSummary with the fields in question populated.
Read more >Ansible aws s3 fails on directory checksuom when getting an ...
Looking at the docs: dest The destination file path when downloading an object/key with a GET operation. Try to call module with file...
Read more >S3A and Checksums (Advanced Feature) - Hortonworks Data ...
This checksum is not compatible with that or HDFS, so cannot be used to compare file versions when using the -update option on...
Read more >Amazon S3 — Cyberduck Help documentation
Transfer files to your S3 account and browse the S3 buckets and files in a hierarchical ... In Finder.app, Creating a new Top-Level...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I think this can be p2 since the problem mostly occurs in the
--externaluse case. But I think this is still a regression forimport-urlso it should be looked into some more at some point@VOvchinnikov Support for external outputs in GCS was unfortunately dropped in DVC 2.0. See https://dvc.org/blog/dvc-2-0-release#breaking-changes. We would like to support it again in the future, but it is not being worked on at the moment, and would likely only happen after a refactor of external outputs.
cc @efiop