Rasa X Decoding error with German umlauts
See original GitHub issueRasa version: 1.1.7
Rasa X version (if used & relevant): 0.19.5
Python version: 3.7.0
Operating system (windows, osx, …): windows
Issue:
I am getting a decoding error when I want to start rasa x with German umlauts in the domain.yml. If I remove the special characters, I can start rasa x without problems. Same issue has been reported already on a rasa-x-demo repository here : https://github.com/RasaHQ/rasa-x-demo/issues/16
After testing, this error also occurs when running rasa train
.
Error (including full traceback):
(base) C:\Users\Documents\workspace_python\FuBo\bot>rasa x
Starting Rasa X in local mode... 🚀
Traceback (most recent call last):
File "c:\users\appdata\local\continuum\anaconda3\lib\site-packages\rasa\cli\x.py", line 322, in run_locally
local.main(args, project_path, args.data, token=rasa_x_token)
File "c:\users\appdata\local\continuum\anaconda3\lib\site-packages\rasax\community\local.py", line 190, in main
project_path, data_path, session, args.port
File "c:\users\appdata\local\continuum\anaconda3\lib\site-packages\rasax\community\local.py", line 139, in _initialize_with_local_data
domain_path, domain_service, COMMUNITY_PROJECT_NAME, COMMUNITY_USERNAME
File "c:\users\appdata\local\continuum\anaconda3\lib\site-packages\rasax\community\initialise.py", line 136, in inject_domain
domain_yaml=read_file(domain_path),
File "c:\users\appdata\local\continuum\anaconda3\lib\site-packages\rasa\utils\io.py", line 130, in read_file
return f.read()
File "c:\users\appdata\local\continuum\anaconda3\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 127: invalid start byte
Command or request that led to error:
rasa x
Content of configuration file (config.yml) (if relevant):
# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: de
pipeline: pretrained_embeddings_spacy
# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
- name: MemoizationPolicy
- name: KerasPolicy
- name: MappingPolicy
Content of domain file (domain.yml) (if relevant):
intents:
- affirm
- deny
- goodbye
- greet
templates:
utter_greet:
- text: Hallo ich bin dein persönlicher Assistent. Wie kann ich Dir helfen?
utter_did_that_help:
- text: Konnte ich Dir damit weiterhelfen?
utter_goodbye:
- text: Ich wünsche Dir noch einen schönen Tag!
actions:
- utter_did_that_help
- utter_goodbye
- utter_greet
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (7 by maintainers)
Top Results From Across the Web
'utf-8' codec can't decode byte 0x92 in position 1498: invalid ...
I have installed rasa (1.2.7) on my conda environment on Windows and everything has been working well so far. Then I have decided...
Read more >python german umlaut issues - 'ascii' codec can't decode byte ...
and i have a word in file: Westfälisch . this is the word where the code gets stuck. this is error message: 'ascii'...
Read more >Use UTF-8 chars in CIFS-Share comments (German Umlauts)
Hello, i use IE6 to access our Celerra Manager Web-GUI. If i try to use German umlauts in comments i get an error...
Read more >apex - How to decode German characters(German umlauts ...
When the German umlaut characters are sent on the query string they should be URL encoded for either UTF-8 or ISO-8859-1.
Read more >German: Problem with Umlauts in month names
When I try to enter (or edit) a date in German that has umlauts in its name, I get the following error message:"There...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for the very descriptive into @taotsetung. I’ve tracked down the part where the domain gets written in Rasa X and you’re right, the encoding isn’t specified:
I assume that
with open(filename, "w", encoding="utf-8") as f:
should do the job, but we’ll check it out.@erohmensing This error still reproduces for me on the latest version, using Windows in German locale. The stdin and stdout streams show UTF-8, but they are not the root cause here.
The underlying issue is that python by default writes to files with the system code page, unless an override is provided when opening the file, and rasa does not specificy UTF8. Additionally, when loading the domain.yml file rasa first reformats and saves it, before actually loading and parsing it, during the first step we lose the encoding, and when loading we are no longer in UTF8 causing the error.
Workaround: (Python 3.7+ only) set the environment variable
PYTHONUTF8
to1
before running rasa, this forces python to use utf8 as default encoding. On Windows:set PYTHONUTF8=1
Solution for rasa/rasa x: When saving the domain file (and other files as well … ) specify utf8 as override. Python 3.7+ only: Enable utf8 mode in code.