question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Text encoding is changed when passing a utf-8 string to content

See original GitHub issue

I’d expect that utf-8 characters (such as German umlauts) are returned the way I passed them to content, however this is not the case:

import httpx
import respx

with respx.mock() as respx_mock:
    text_in = "Gehäusegröße"
    respx_mock.get("https://example.com", content=text_in)
    text_out = httpx.get("https://example.com").text
    print(text_in)
    print(text_out)
    print(text_in == text_out)
Gehäusegröße
Gehäusegröße
False

Versions: respx 0.12.1 and httpx 0.14.3

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
lundbergcommented, Sep 11, 2020

Awesome if this could be solved in HTTPX!

If/when that HTTPX PR is merged, we’ll need to decide if it’s enough to just remove mentioned RESPX todo’s and the encoding-code with it.

0reactions
lundbergcommented, Sep 23, 2020

About the content encoding, I think we should introduce encoding option to ResponseTemplate to align with HTTPX response flexibility.

From HTTPX chardet PR:

*Users can override this behaviour if required with an explicit response.encoding = ...

We can have it set to utf-8 by default, but overridable when adding patterns to RESPX.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Encode String to UTF-8 - java - Stack Overflow
String objects in Java use the UTF-16 encoding that can't be modified *. The only thing that can have a different encoding is...
Read more >
Encoding.UTF8 Property (System.Text) - Microsoft Learn
This property returns a UTF8Encoding object that encodes Unicode (UTF-16-encoded) characters into a sequence of one to four bytes per character, and that ......
Read more >
Choosing & applying a character encoding - W3C
Which character encoding should I use for my content, and how do I apply it to my content? Content is composed of a...
Read more >
UTF-8 Everywhere
Our goal is to promote usage and support of the UTF-8 encoding and to convince that it should be the default choice of...
Read more >
Set-Content - PowerShell - SS64.com
String Use the encoding type for a string. Unicode UTF-16 format little-endian byte order. Byte Encode characters as a sequence of bytes.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found