question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Mimetype detected by libmagic is inaccurate

See original GitHub issue

If I create file in Microsoft Word (Mac, version 16.17) and then check its mimetype using libmagic on the command line I get the following output on Mac

$ file --mime file_created_in_word.docx
application/vnd.openxmlformats-officedocument.wordprocessingml.document

However, files created by python-docx get a different mimetype

$ file --mime file_created_by_python-docx.docx
application/octet-stream

This affected me in some testing code that wanted to check that the document that was output was the correct type. Can anyone give insight into this?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
xsduancommented, Sep 15, 2018

Here’s the relevant source file: https://github.com/threatstack/libmagic/blob/1249b5cd02c3b6fb9b917d16c76bc76c862932b6/magic/Magdir/msooxml

Apparently the file order is important. You can test this by renaming the first 4 characters of the third file (the first ascii text after the 3rd PK\x03\x04 in the file) to word/, and it’ll register correctly as word 2007. python-docx seems to put docProps/core.xml as the 3rd file (because it puts all the files in alphabetical order).

0reactions
kyprifogcommented, Sep 1, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

Incorrect MIME type detected by libmagic - openpyxl - Heptapod
This causes a problem when uploading files to services that enforce matching file extension and MIME type. For example, Mediawiki-based wikis ( ...
Read more >
find correct mimetype of file in python
find mime type of file by python. i tried it by magic module it, but detect wrong mime type. i also tried it...
Read more >
Bug #79045 :: Incorrect svg mimetypes detected
Bug #79045, Incorrect svg mimetypes detected ... However, libmagic returns svgs missing the xml header as text/plain.
Read more >
Ristretto fails to load bmp (libmagic gives wrong mime-type)
I've tested both versions in Arch linux and Xubuntu 16.04. I checked the code under debugger and found this is a regression introduced...
Read more >
why use libmagic on windows to get mimetype. Is there an ...
I found this code in the project https://pypi.org/project/scrapy-warcio/. I assume libmagic is needed here to check identify filetypes.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found