question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

push: RAW file considered as text file (bad MD5)

See original GitHub issue

Bug Report

Description

The RAW file has a md5sum

fd0de1350b92b00d60afd53b015f6aea 214089_JAI.raw

But DVC calculates it as

md5: 0b4d86bc06ee3260e8172b2196805382 size: 63232000 path: 214089_JAI.raw

This happens because it identifies it as a text file and runs the dos2unix replacement: https://github.com/iterative/dvc/blob/1.11/dvc/utils/__init__.py#L39 -> https://github.com/iterative/dvc/blob/1.11/dvc/istextfile.py#L34

It still happens in version 2.4.3 https://github.com/iterative/dvc/blob/2.4.3/dvc/utils/__init__.py#L37 -> https://github.com/iterative/dvc/blob/2.4.3/dvc/istextfile.py#L22

When uploading it through the gocloud.dev library, it fails due to the MD5 check, since the one calculated by DVC and the real one of the file is not the same: https://github.com/google/go-cloud/blob/v0.23.0/blob/blob.go#L328

Reproduce

  1. dvc init
  2. dvc remote modify --local our-proxy password 123123
  3. Copy 214089_JAI.raw to the directory
  4. dvc add 214089_JAI.raw
  5. dvc push

Expected

The file is expected to upload correctly, but since the md5 of the file and the one sent by DVC do not match, the upload is canceled

Environment information

Output of dvc doctor:

$ dvc doctor
DVC version: 1.11.16 (pip)
---------------------------------
Platform: Python 3.8.5 on Linux-5.4.0-65-generic-x86_64-with-glibc2.29
Supports: http, https, ssh
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p1
Caches: local
Remotes: https
Workspace directory: ext4 on /dev/nvme0n1p1
Repo: dvc, git

Additional Information (if any): https://github.com/atekoa/dvc-rawfile

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
atekoacommented, Jul 1, 2021

I use the DVC calculated hash to avoid having to rewrite the file in our proxy, recalculate the hash and send the correct hash and the file to the gocloud.dev library, which is ultimately responsible for uploading the file and verifying the md5. The gocloud.dev library is the one that requires me to send the md5 of the file to verify that have written the data correctly, but I don’t have the correct md5 if I don’t generate it myself, right?

1reaction
isidenticalcommented, Jun 30, 2021
Read more comments on GitHub >

github_iconTop Results From Across the Web

c# - Calculate MD5 checksum for a file
I download the same PDF files everyday, and I want to see if the PDF has been modified. If the text and modification...
Read more >
John the ripper(kali linux) cant load hashes
I am trying to crack a md5 hash using ...
Read more >
Linux Generate A MD5 String or Hash with md5sum ...
This is a default tool on most modern Linux distributions. It generate a md5 hash for given string or words or filenames.
Read more >
How to get the MD5 hash of a string directly in the terminal?
Inputting some text and then using Enter and then Ctrl + D to signify end of file then causes md5sum to spit out...
Read more >
How to Create an Image Using FTK Imager – eDiscovery ...
Select Image Type: This indicates the type of image file that will be created – Raw is a bit-by-bit uncompressed copy of the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found