question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add MD5 digest in file processing response

See original GitHub issue

When processing a PDF with the web service, we should include in the TEI result the MD5 digest of the original file, so that we can bind the TEI to the right version of the PDF.

This is done for instance here. Just one problem, I don’t know how to encode it in the TEI header (under <sourceDesc>? but how?).

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
kermitt2commented, Apr 16, 2021

We can calculate a MD5 digest from any data, it’s a way to get a signature. The @subtype="pdf" would indicate that the MD5 was calculated from the PDF used to create the TEI, so not a docx or another xml format of the document for instance. One motivation behind this is to be sure that we can use the coordinate information present in the TEI with a given PDF (that might be downloaded online, so without guarantee that it was the PDF originally used with Grobid).

0reactions
kermitt2commented, Apr 16, 2021

Yes, you’re right about the subtype, thinking twice this looks really unnecessary, so let’s drop it! thanks !!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Getting a File's MD5 Checksum in Java - Stack Overflow
For a big file this will use a lot of memory since the whole file is read and then fed to the digest...
Read more >
MD5-DIGEST function - Progress Documentation
Hashes the specified data using the RSA Message Digest Hash Algorithm (MD5), and returns a 16-byte binary message digest value as a RAW...
Read more >
Learn How to Generate and Verify Files with MD5 Checksum ...
MD5 (Message Digest 5) sums can be used as a checksum to verify ... The md5sums command below will generate a hash value...
Read more >
What is MD5 (MD5 Message-Digest Algorithm)? - TechTarget
The MD5 message-digest hashing algorithm processes data in 512-bit strings, broken down into 16 words composed of 32 bits each. The output from...
Read more >
Know Working And Uses Of MD5 Algorithm - eduCBA
MD5 produces the message digest through five steps, i.e. padding, append length, dividing the input into 512-bit blocks, initialising chaining variables a ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found