question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HTTP 1 request headers decoded using default encoding instead of ISO-8859-1

See original GitHub issue

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

headers are decoded here without specifying their encoding:

https://github.com/sanic-org/sanic/blob/ad4e526c775fc3ce950503d6476d9d344492b0dd/sanic/http/http1.py#L205

On my system (osx using python 3.10.8 installed via homebrew) this causes bytes that are valid characters in ISO-8859-1 but not in UTF-8 to be decoded as surrogate escape characters, e.g. b"\x80" becomes "\udf80" instead of "\x80"

Code snippet

No response

Expected Behavior

headers encoded as ISO-8859-1 with no MIME type to be decoded correctly without using UTF-8 surrogate escape characters.

How do you run Sanic?

As a script (app.run or Sanic.serve)

Operating System

linux

Sanic Version

22.9.1

Additional context

this used to work as expected in Sanic<=20.12.7

Issue Analytics

  • State:open
  • Created 10 months ago
  • Reactions:1
  • Comments:13 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
Troniccommented, Nov 24, 2022

@relud After a discussion we are now planning to use ISO-8859-1 for request headers within the Sanic built-in server as well, to make it match ASGI and other frameworks that behave this way, also matching the behavior of older Sanic releases. It is noted that ISO-8859-1 can also be encoded back to original bytes, from which one can obtain UTF-8 or other decoding if needed.

As your bug report was apparently the first we’ve received on this, the issue is probably not affecting many at all, but at least this should make your implementation a bit easier. This is a breaking change, so no promises yet on when it will be released, even if everyone else is using ASCII headers and thus isn’t affected by it.

1reaction
Troniccommented, Nov 18, 2022

Earlier Sanic versions handled headers as ISO-8859-1, which was causing trouble when they actually were in UTF-8 (more common nowadays). I had to put a lot of thought into this while reimplementing the HTTP parser code as leaving them as bytes wouldn’t be practical either. The surrogate escape coding is WTF-8 which indeed is meant for preserving garbage, being able to restore original bytes of what might be ill-formed UTF-8. I’m glad you found use for this detail of Sanic’s implementation, being able to restore those bytes instead of simply showing “replacement character” as a naive implementation might.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Assume ISO-8859-1 (instead of UTF-8) encoding for ASGI ...
Currently, headers in Falcon's ASGI package will be decoded using the Python's default UTF-8 decoding. For instance: falcon/falcon/asgi/request.
Read more >
UTF-8 in HTTP headers - Jmix
UTF-8 in HTTP headers. HTTP 1.1 is a well-known hypertext protocol for data transfer. HTTP messages are encoded with ISO-8859-1 (which can be...
Read more >
What encoding to use when interpreting HTTP/1.1 header field ...
Recipients usually decode using ISO-8859-1, which at least allows recovery later on (because it'll preserve all octets).
Read more >
HTTP/1.1: Header Field Definitions
If an Accept-Encoding field is present in a request, and if the server cannot send a response which is acceptable according to the...
Read more >
RFC 7230: Hypertext Transfer Protocol (HTTP/1.1)
RFC 7230 HTTP/1.1 Message Syntax and Routing June 2014 3.2.1. ... Protocol (HTTP) is a stateless application- level request/response protocol that uses ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found