question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

obj/s3: directory checksum for s3 fails with NotImplementedError

See original GitHub issue

Reported from multiple users when adding/importing a directory on s3, the issue occurs in both import-url and add --external. Importing an individual file and get-url work normally as expected.

(reported against both 2.8.2 - s3 (s3fs = 2021.10.1, boto3 = 1.19.7) and 2.8.3 - s3 (s3fs = 2021.11.0, boto3 = 1.17.106)

dvc import-url --file data/raw.dvc s3://test/sample data/raw -v
2021-11-21 22:10:16,748 DEBUG: Lockfile 'dvc.lock' needs to be updated.
2021-11-21 22:10:16,834 DEBUG: Removing output 'data/raw/sample' of stage: 'data/raw.dvc'.
2021-11-21 22:10:16,834 DEBUG: Removing 'data/raw/sample'
Importing 's3://test/sample' -> 'data/raw/sample'
2021-11-21 22:10:16,839 DEBUG: Computed stage: 'data/raw.dvc' md5: '782d7c58160f763093fa1761aaea4bc5'
2021-11-21 22:10:16,839 DEBUG: 'md5' of stage: 'data/raw.dvc' changed.
2021-11-21 22:10:19,085 ERROR: unexpected error                                                                                                                                                
------------------------------------------------------------
Traceback (most recent call last):
...
    _, self.meta, obj = ostage(
  File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 296, in stage
    meta, obj = _stage_tree(
  File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 170, in _stage_tree
    meta, tree = _build_tree(path_info, fs, name, odb=odb, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 138, in _build_tree
    for file_info, meta, obj in _iter_objects(path_info, fs, name, **kwargs):
  File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 130, in _iter_objects
    yield from _build_objects(path_info, fs, name, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 126, in _build_objects
    yield from executor.map(worker, walk_iterator)
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/progress.py", line 133, in wrapped
    res = fn(*args, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 83, in _stage_file
    meta, hash_info = get_file_hash(path_info, fs, name, state=state)
  File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 72, in get_file_hash
    meta, hash_info = _get_file_hash(path_info, fs, name)
  File "/root/.cache/pypoetry/virtualenvs/qt-expressions-classification-wj407T3L-py3.8/lib/python3.8/site-packages/dvc/objects/stage.py", line 57, in _get_file_hash
    raise NotImplementedError
NotImplementedError

add --external:

------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/conda/envs/dvc/lib/python3.9/site-packages/dvc/main.py", line 55, in main
    ret = cmd.do_run()
  File "/opt/conda/envs/dvc/lib/python3.9/site-packages/dvc/command/base.py", line 45, in do_run
    return self.run()
  File "/opt/conda/envs/dvc/lib/python3.9/site-packages/dvc/command/add.py", line 21, in run
    self.repo.add(
  File "/opt/conda/envs/dvc/lib/python3.9/site-packages/dvc/utils/collections.py", line 163, in inner
    result = func(*ba.args, **ba.kwargs)
...
  File "/opt/conda/envs/dvc/lib/python3.9/site-packages/dvc/objects/stage.py", line 102, in _stage_file
    meta, hash_info = get_file_hash(path_info, fs, name, state=state)
  File "/opt/conda/envs/dvc/lib/python3.9/site-packages/dvc/objects/stage.py", line 90, in get_file_hash
    meta, hash_info = _get_file_hash(path_info, fs, name)
  File "/opt/conda/envs/dvc/lib/python3.9/site-packages/dvc/objects/stage.py", line 71, in _get_file_hash
    raise NotImplementedError
NotImplementedError

discord context (with full tracebacks):

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:1
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
pmrowlacommented, Feb 16, 2022

I think this can be p2 since the problem mostly occurs in the --external use case. But I think this is still a regression for import-url so it should be looked into some more at some point

2reactions
dberenbaumcommented, Feb 10, 2022

@VOvchinnikov Support for external outputs in GCS was unfortunately dropped in DVC 2.0. See https://dvc.org/blog/dvc-2-0-release#breaking-changes. We would like to support it again in the future, but it is not being worked on at the moment, and would likely only happen after a refactor of external outputs.

cc @efiop

Read more comments on GitHub >

github_iconTop Results From Across the Web

Checking object integrity - Amazon Simple Storage Service
Amazon S3 uses checksum values to verify the integrity of data that you upload to ... If the two checksum values don't match,...
Read more >
load not defined for Aws::S3::ObjectSummary · Issue #1172 ...
For example, I can see that a call like s3.resource.bucket(b).objects.first will return an ObjectSummary with the fields in question populated.
Read more >
Ansible aws s3 fails on directory checksuom when getting an ...
Looking at the docs: dest The destination file path when downloading an object/key with a GET operation. Try to call module with file...
Read more >
S3A and Checksums (Advanced Feature) - Hortonworks Data ...
This checksum is not compatible with that or HDFS, so cannot be used to compare file versions when using the -update option on...
Read more >
Amazon S3 — Cyberduck Help documentation
Transfer files to your S3 account and browse the S3 buckets and files in a hierarchical ... In Finder.app, Creating a new Top-Level...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found