question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Rendering bulgarian texts ends up as UTF-16 escaped

See original GitHub issue

Hello,

I’m using thymeleaf 3.0.9.RELEASE and I’m rendering the following Bulgarian text: “Безплатна доставка на следващия ден” and it ends up rendered as \u0411\u0435\u0437\u043F\u043B\u0430\u0442\u043D\u0430 \u0434\u043E\u0441\u0442\u0430\u0432\u043A\u0430 \u043D\u0430 \u0441\u043B\u0435\u0434\u0432\u0430\u0449\u0438\u044F \u0434\u0435\u043D

Here’s my method:

    @Override
    public String parseContent(@Nonnull final String content, final Map<String, Object> params) {
        Assert.notNull(content, "Parameter content must be non-null");
        return templateEngine.process("<span th:inline=\"javascript\">" + content + "</span>", createSpringWebContext(params));
    }

Where content is

“Купете стоки на стойност повече от [[${currencyFormatter.print(promotion.activeSubtotalThreshold, locale)}]] за да получите [[${promotion.quantity}]] of [[${promotion.product.getName(locale)}]] БЕЗПЛАТНО!”

and promotion is a variable in the context with name in Bulgarian locale being “Безплатна доставка на следващия ден”.

I found out the problem is because when you are creating your JacksonStandardJavaScriptSerializer you are specifying this:

this.mapper.enable(JsonGenerator.Feature.ESCAPE_NON_ASCII);

making it like this:

this.mapper.disable(JsonGenerator.Feature.ESCAPE_NON_ASCII);

works perfectly fine. I will submit a PR.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
ptahchievcommented, Apr 6, 2018

Hi @danielfernandez

thank you for the response - it works perfectly fine with no javascript inlining. I guess I was confused because this used to work with thymeleaf 2.x

Once again, thank you for your fast response.

0reactions
danielfernandezcommented, Apr 6, 2018

OK, but what I mean is that apparently you are rendering that text inside an HTML element (<span>), but you are using th:inline="javascript", which considers what is processed inside as JavaScript code and therefore applies Jackson and, in your case, JavaScript-escaping. Thus the unicode escapes, which are perfectly valid JavaScript escaped text.

So I’m confused about your use of th:inline="javascript" in a <span> element. If you are doing that only because you want your [[${...}]] expressions resolved, and seeing you are using Thymeleaf 3.0, you don’t really have to use inlining at all.

Assuming you are using the StringTemplateResolver, this:

return templateEngine.process("<span>" + content + "</span>", createSpringWebContext(params));

or even simply this:

return templateEngine.process(content, createSpringWebContext(params));

…should work for you. There will be escaping anyway for your Bulgarian characters, but it will be HTML escaping and not JavaScript, which I presume is what you are looking for.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Mojibake - Wikipedia
Failed rendering of glyphs due to either missing fonts or missing glyphs in a font is a different issue that is not to...
Read more >
How it works — chardet 5.0.0 documentation
If the text contains a recognizable escape sequence that might indicate an escaped encoding, UniversalDetector creates an EscCharSetProber (defined in escprober ...
Read more >
Understanding Unicode™ - I - Computers and Writing Systems
Unicode specifies that if data is identified as being in the UTF-16 or UTF-32 encoding scheme (not form) so that the byte order...
Read more >
Untitled
Spearhead exile steam, Sony vegas 12 render settings german, Forest of dean mini enduro, ... Define implement as used in the text, Come...
Read more >
HTML 5.1: 8. The HTML syntax - W3C
Finally, the comment must be ended by the three character sequence U+002D ... If charset is a UTF-16 encoding, then set charset to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found