Add MD5 digest in file processing response
See original GitHub issueWhen processing a PDF with the web service, we should include in the TEI result the MD5 digest of the original file, so that we can bind the TEI to the right version of the PDF.
This is done for instance here. Just one problem, I don’t know how to encode it in the TEI header (under <sourceDesc>
? but how?).
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Getting a File's MD5 Checksum in Java - Stack Overflow
For a big file this will use a lot of memory since the whole file is read and then fed to the digest...
Read more >MD5-DIGEST function - Progress Documentation
Hashes the specified data using the RSA Message Digest Hash Algorithm (MD5), and returns a 16-byte binary message digest value as a RAW...
Read more >Learn How to Generate and Verify Files with MD5 Checksum ...
MD5 (Message Digest 5) sums can be used as a checksum to verify ... The md5sums command below will generate a hash value...
Read more >What is MD5 (MD5 Message-Digest Algorithm)? - TechTarget
The MD5 message-digest hashing algorithm processes data in 512-bit strings, broken down into 16 words composed of 32 bits each. The output from...
Read more >Know Working And Uses Of MD5 Algorithm - eduCBA
MD5 produces the message digest through five steps, i.e. padding, append length, dividing the input into 512-bit blocks, initialising chaining variables a ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
We can calculate a MD5 digest from any data, it’s a way to get a signature. The
@subtype="pdf"
would indicate that the MD5 was calculated from the PDF used to create the TEI, so not a docx or another xml format of the document for instance. One motivation behind this is to be sure that we can use the coordinate information present in the TEI with a given PDF (that might be downloaded online, so without guarantee that it was the PDF originally used with Grobid).Yes, you’re right about the subtype, thinking twice this looks really unnecessary, so let’s drop it! thanks !!