question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Rasa X Decoding error with German umlauts

See original GitHub issue

Rasa version: 1.1.7

Rasa X version (if used & relevant): 0.19.5

Python version: 3.7.0

Operating system (windows, osx, …): windows

Issue:

I am getting a decoding error when I want to start rasa x with German umlauts in the domain.yml. If I remove the special characters, I can start rasa x without problems. Same issue has been reported already on a rasa-x-demo repository here : https://github.com/RasaHQ/rasa-x-demo/issues/16

After testing, this error also occurs when running rasa train.

Error (including full traceback):

(base) C:\Users\Documents\workspace_python\FuBo\bot>rasa x
Starting Rasa X in local mode... 🚀
Traceback (most recent call last):
  File "c:\users\appdata\local\continuum\anaconda3\lib\site-packages\rasa\cli\x.py", line 322, in run_locally
    local.main(args, project_path, args.data, token=rasa_x_token)
  File "c:\users\appdata\local\continuum\anaconda3\lib\site-packages\rasax\community\local.py", line 190, in main
    project_path, data_path, session, args.port
  File "c:\users\appdata\local\continuum\anaconda3\lib\site-packages\rasax\community\local.py", line 139, in _initialize_with_local_data
    domain_path, domain_service, COMMUNITY_PROJECT_NAME, COMMUNITY_USERNAME
  File "c:\users\appdata\local\continuum\anaconda3\lib\site-packages\rasax\community\initialise.py", line 136, in inject_domain
    domain_yaml=read_file(domain_path),
  File "c:\users\appdata\local\continuum\anaconda3\lib\site-packages\rasa\utils\io.py", line 130, in read_file
    return f.read()
  File "c:\users\appdata\local\continuum\anaconda3\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 127: invalid start byte

Command or request that led to error:

rasa x

Content of configuration file (config.yml) (if relevant):

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: de
pipeline: pretrained_embeddings_spacy

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: MemoizationPolicy
  - name: KerasPolicy
  - name: MappingPolicy

Content of domain file (domain.yml) (if relevant):

intents:
- affirm
- deny
- goodbye
- greet
templates:
  utter_greet:
  - text: Hallo ich bin dein persönlicher Assistent. Wie kann ich Dir helfen?
  utter_did_that_help:
  - text: Konnte ich Dir damit weiterhelfen?
  utter_goodbye:
  - text: Ich wünsche Dir noch einen schönen Tag!
actions:
- utter_did_that_help
- utter_goodbye
- utter_greet

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:10 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
erohmensingcommented, Sep 11, 2019

Thanks for the very descriptive into @taotsetung. I’ve tracked down the part where the domain gets written in Rasa X and you’re right, the encoding isn’t specified:

def dump_yaml_to_file(filename: Text, content: Any) -> Optional[str]:
    """Dump content to yaml."""
    with open(filename, "w") as f:
        f.write(dump_yaml(content))

I assume that with open(filename, "w", encoding="utf-8") as f:should do the job, but we’ll check it out.

2reactions
daniel-edercommented, Sep 11, 2019

@erohmensing This error still reproduces for me on the latest version, using Windows in German locale. The stdin and stdout streams show UTF-8, but they are not the root cause here.

The underlying issue is that python by default writes to files with the system code page, unless an override is provided when opening the file, and rasa does not specificy UTF8. Additionally, when loading the domain.yml file rasa first reformats and saves it, before actually loading and parsing it, during the first step we lose the encoding, and when loading we are no longer in UTF8 causing the error.

Workaround: (Python 3.7+ only) set the environment variable PYTHONUTF8 to 1 before running rasa, this forces python to use utf8 as default encoding. On Windows: set PYTHONUTF8=1

Solution for rasa/rasa x: When saving the domain file (and other files as well … ) specify utf8 as override. Python 3.7+ only: Enable utf8 mode in code.

Read more comments on GitHub >

github_iconTop Results From Across the Web

'utf-8' codec can't decode byte 0x92 in position 1498: invalid ...
I have installed rasa (1.2.7) on my conda environment on Windows and everything has been working well so far. Then I have decided...
Read more >
python german umlaut issues - 'ascii' codec can't decode byte ...
and i have a word in file: Westfälisch . this is the word where the code gets stuck. this is error message: 'ascii'...
Read more >
Use UTF-8 chars in CIFS-Share comments (German Umlauts)
Hello, i use IE6 to access our Celerra Manager Web-GUI. If i try to use German umlauts in comments i get an error...
Read more >
apex - How to decode German characters(German umlauts ...
When the German umlaut characters are sent on the query string they should be URL encoded for either UTF-8 or ISO-8859-1.
Read more >
German: Problem with Umlauts in month names
When I try to enter (or edit) a date in German that has umlauts in its name, I get the following error message:"There...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found