Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failed parsing Markdown training section header with colon (:)

See original GitHub issue

Rasa version: 1.4.2

Rasa X version (if used & relevant):

Python version: 3.7

Operating system (windows, osx, …): OS X Issue: A recently introduced regex parses Markdown section headers in the wrong way in training files. This seems related to recently introduced code in

    def _find_section_header(self, line: Text) -> Optional[Tuple[Text, Text]]:
        """Checks if the current line contains a section header
        and returns the section and the title."""
        match = re.search(r"##\s*(.+):(.+)", line)
        if match is not None:
            return match.group(1), match.group(2)

        return None

Which performs a greedy lookup. For a section header such as ## synonym:10:00 am, the section and value are reported as ('synonym:10', '00 am') instead of the expected ('synonym', '10:00 am'). This results in a failure to train.

Proposed Solution: Change the regex to ##\s*(.+?):(.+)

Error (including full traceback):

Traceback (most recent call last):
  File "/Users/ethan/src/doodle/svc-doodlebot/env/bin/rasa", line 8, in <module>
    sys.exit(main())
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/__main__.py", line 76, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/cli/train.py", line 76, in train
    kwargs=extract_additional_arguments(args),
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/train.py", line 45, in train
    kwargs=kwargs,
  File "uvloop/loop.pyx", line 1417, in uvloop.loop.Loop.run_until_complete
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/train.py", line 96, in train_async
    kwargs,
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/train.py", line 137, in _train_async_internal
    new_fingerprint = await model.model_fingerprint(file_importer)
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/model.py", line 204, in model_fingerprint
    nlu_data = await file_importer.get_nlu_data()
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/importers/importer.py", line 269, in get_nlu_data
    nlu_data = await asyncio.gather(*nlu_data)
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/importers/rasa.py", line 60, in get_nlu_data
    return utils.training_data_from_paths(self._nlu_files, language)
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/importers/utils.py", line 9, in training_data_from_paths
    training_datas = [loading.load_data(nlu_file, language) for nlu_file in paths]
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/importers/utils.py", line 9, in <listcomp>
    training_datas = [loading.load_data(nlu_file, language) for nlu_file in paths]
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/nlu/training_data/loading.py", line 67, in load_data
    data_sets = [_load(f, language) for f in files]
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/nlu/training_data/loading.py", line 67, in <listcomp>
    data_sets = [_load(f, language) for f in files]
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/nlu/training_data/loading.py", line 138, in _load
    return reader.read(filename, language=language, fformat=fformat)
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/nlu/training_data/formats/readerwriter.py", line 10, in read
    return self.reads(rasa.utils.io.read_file(filename), **kwargs)
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/nlu/training_data/formats/markdown.py", line 73, in reads
    self._set_current_section(header[0], header[1])
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/nlu/training_data/formats/markdown.py", line 192, in _set_current_section
    "".format(section, "', '".join(available_sections))
ValueError: Found markdown section 'synonym:10' which is not in the allowed sections 'intent', 'synonym', 'regex', 'lookup'.

Command or request that led to error:

rasa train

Content of configuration file (config.yml) (if relevant):

language: en
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: CRFEntityExtractor
- name: EntitySynonymMapper
- name: SklearnIntentClassifier
- name: CountVectorsFeaturizer
- name: EmbeddingIntentClassifier
- name: DucklingHTTPExtractor
  url: http://localhost:8000
  locale: en_US
  dimensions:
  - time
  - duration
  timezone: UTC
policies:
- name: KerasPolicy
- name: MappingPolicy

Content of domain file (domain.yml) (if relevant):

Content of training NLU Markdown