Mimetype detected by libmagic is inaccurate
See original GitHub issueIf I create file in Microsoft Word (Mac, version 16.17) and then check its mimetype using libmagic on the command line I get the following output on Mac
$ file --mime file_created_in_word.docx
application/vnd.openxmlformats-officedocument.wordprocessingml.document
However, files created by python-docx get a different mimetype
$ file --mime file_created_by_python-docx.docx
application/octet-stream
This affected me in some testing code that wanted to check that the document that was output was the correct type. Can anyone give insight into this?
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Incorrect MIME type detected by libmagic - openpyxl - Heptapod
This causes a problem when uploading files to services that enforce matching file extension and MIME type. For example, Mediawiki-based wikis ( ...
Read more >find correct mimetype of file in python
find mime type of file by python. i tried it by magic module it, but detect wrong mime type. i also tried it...
Read more >Bug #79045 :: Incorrect svg mimetypes detected
Bug #79045, Incorrect svg mimetypes detected ... However, libmagic returns svgs missing the xml header as text/plain.
Read more >Ristretto fails to load bmp (libmagic gives wrong mime-type)
I've tested both versions in Arch linux and Xubuntu 16.04. I checked the code under debugger and found this is a regression introduced...
Read more >why use libmagic on windows to get mimetype. Is there an ...
I found this code in the project https://pypi.org/project/scrapy-warcio/. I assume libmagic is needed here to check identify filetypes.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Here’s the relevant source file: https://github.com/threatstack/libmagic/blob/1249b5cd02c3b6fb9b917d16c76bc76c862932b6/magic/Magdir/msooxml
Apparently the file order is important. You can test this by renaming the first 4 characters of the third file (the first ascii text after the 3rd
PK\x03\x04
in the file) toword/
, and it’ll register correctly as word 2007. python-docx seems to putdocProps/core.xml
as the 3rd file (because it puts all the files in alphabetical order).I’m having this same issue: https://github.com/ahupp/python-magic/issues/208