UUencoded attachment parsing
See original GitHub issueWhen dealing with attachments encoded via uuencoding (Content-transfer-encoding
is uuencode
or x-uuencode
), mail-parser treats them as text, as can be seen in parse()
(mailparser.py:378
):
if transfer_encoding == "base64" or (
transfer_encoding == "quoted-\
printable" and "application" in mail_content_type):
...
else:
payload = ported_string(p.get_payload(decode=True), encoding=charset)
log.debug("Filename {!r} part {!r} is not binary".format(filename, i))
Within the else
block, the payload is correctly decoded with p.get_payload(decode=True)
, but then passed to ported_string()
which attempts to encode the returned bytes to UTF-8 in utils.py:85
:
def ported_string(raw_data, encoding='utf-8', errors='ignore'):
...
try:
return six.text_type(raw_data, encoding).strip()
except (LookupError, UnicodeDecodeError):
return six.text_type(raw_data, "utf-8", errors).strip()
Since errors
are ignored, encoding doesn’t fail, but returns a attachment stripped of all bytes that can’t be encoded in utf-8 (that can be easily verified by attempting to write that binary to disk with write_attachments
).
I encountered this issue while porting SpamScope to Python3, which has a test test_store_samples_unicode_error
that parses and saves a uuencoded attachment. According to the test, the resulting file should have a MD5 checksum of 2ea90c996ca28f751d4841e6c67892b8
. That test passes with Python2, because the incorrectly parsed payload does indeed have that hash. However, with Python3 the hash changes due to differences in unicode handling. However, the correct checksum is actually 4f2cf891e7cfb349fca812091f184ecc
.
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (8 by maintainers)
Top GitHub Comments
Please have a look at my SpamScope fork and the Storm Dockerfile which the new SpamScope image depends on. So far, the included tests run fine, as does the default debug topology. The project we’re using SpamScope in also seems to run on py3 without further issues. However, due to the mediocre test coverage we can’t be all too confident that this update doesn’t break anything. Moreover, I didn’t update the Ansible playbooks due to a lack of time.
Yeah, we have some py3-only dependencies, which is why I’m currently in the process of porting it over. If you’re interested in the results, I’ll happily send a pull request as soon as I’m done. However, I’m not keeping backwards compatibility: it won’t run with py2 anymore. In addition to that, I’m only using the Docker-based version, so I won’t touch the Ansible stuff for now. Docker-wise, since the SpamScope image depends on
fmantuano/spamscope-deps
(which I didn’t find a repository for) and this again depends onfmantuano/apache-storm
(which I DID find a repository for), I did the following:fmantuano/apache-storm
and updated Apache Storm to version 2.2.0 (see here)spamscope-deps
into the Dockerfile within the SpamScope repository and updated all dependencies. The drawback of that merge is that building the v8 engine for thug takes forever, however now users have the chance to manage their dependencies which I find better than depending on the outdatedsoamscope-deps
from Docker Hub.Let me know if I should send you PR requests for all that stuff. A separate branch might be appropriate.