question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Failed parsing Markdown training section header with colon (:)

See original GitHub issue

Rasa version: 1.4.2

Rasa X version (if used & relevant):

Python version: 3.7

Operating system (windows, osx, …): OS X Issue: A recently introduced regex parses Markdown section headers in the wrong way in training files. This seems related to recently introduced code in

    def _find_section_header(self, line: Text) -> Optional[Tuple[Text, Text]]:
        """Checks if the current line contains a section header
        and returns the section and the title."""
        match = re.search(r"##\s*(.+):(.+)", line)
        if match is not None:
            return match.group(1), match.group(2)

        return None

Which performs a greedy lookup. For a section header such as ## synonym:10:00 am, the section and value are reported as ('synonym:10', '00 am') instead of the expected ('synonym', '10:00 am'). This results in a failure to train.

Proposed Solution: Change the regex to ##\s*(.+?):(.+)

Error (including full traceback):

Traceback (most recent call last):
  File "/Users/ethan/src/doodle/svc-doodlebot/env/bin/rasa", line 8, in <module>
    sys.exit(main())
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/__main__.py", line 76, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/cli/train.py", line 76, in train
    kwargs=extract_additional_arguments(args),
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/train.py", line 45, in train
    kwargs=kwargs,
  File "uvloop/loop.pyx", line 1417, in uvloop.loop.Loop.run_until_complete
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/train.py", line 96, in train_async
    kwargs,
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/train.py", line 137, in _train_async_internal
    new_fingerprint = await model.model_fingerprint(file_importer)
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/model.py", line 204, in model_fingerprint
    nlu_data = await file_importer.get_nlu_data()
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/importers/importer.py", line 269, in get_nlu_data
    nlu_data = await asyncio.gather(*nlu_data)
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/importers/rasa.py", line 60, in get_nlu_data
    return utils.training_data_from_paths(self._nlu_files, language)
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/importers/utils.py", line 9, in training_data_from_paths
    training_datas = [loading.load_data(nlu_file, language) for nlu_file in paths]
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/importers/utils.py", line 9, in <listcomp>
    training_datas = [loading.load_data(nlu_file, language) for nlu_file in paths]
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/nlu/training_data/loading.py", line 67, in load_data
    data_sets = [_load(f, language) for f in files]
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/nlu/training_data/loading.py", line 67, in <listcomp>
    data_sets = [_load(f, language) for f in files]
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/nlu/training_data/loading.py", line 138, in _load
    return reader.read(filename, language=language, fformat=fformat)
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/nlu/training_data/formats/readerwriter.py", line 10, in read
    return self.reads(rasa.utils.io.read_file(filename), **kwargs)
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/nlu/training_data/formats/markdown.py", line 73, in reads
    self._set_current_section(header[0], header[1])
  File "/Users/ethan/src/doodle/svc-doodlebot/env/lib/python3.7/site-packages/rasa/nlu/training_data/formats/markdown.py", line 192, in _set_current_section
    "".format(section, "', '".join(available_sections))
ValueError: Found markdown section 'synonym:10' which is not in the allowed sections 'intent', 'synonym', 'regex', 'lookup'.

Command or request that led to error:

rasa train

Content of configuration file (config.yml) (if relevant):

language: en
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: CRFEntityExtractor
- name: EntitySynonymMapper
- name: SklearnIntentClassifier
- name: CountVectorsFeaturizer
- name: EmbeddingIntentClassifier
- name: DucklingHTTPExtractor
  url: http://localhost:8000
  locale: en_US
  dimensions:
  - time
  - duration
  timezone: UTC
policies:
- name: KerasPolicy
- name: MappingPolicy

Content of domain file (domain.yml) (if relevant):


Content of training NLU Markdown

## synonym:10:00 am
- @10:00 am

(Some extra before and after)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
erohmensingcommented, Nov 11, 2019
1reaction
erohmensingcommented, Nov 1, 2019

Nope, just fork the repo, create a branch for your fix, and open a PR! 😃I’ll assign you the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

R Markdown YAML "Scanner error: mapping values..."
I know this is a 5 year old question but I just got this same error as I was missing a colon
Read more >
Pandoc User's Guide
Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. Pandoc can...
Read more >
11 Data import | R for Data Science: Exercise Solutions
col_names and col_types are used to specify the column names and how to parse the columns; locale is important for determining things like...
Read more >
Introductory Guide to Markdown for Documentation Writers
It allows you to style a digital text document using typical formatting techniques: for example, headings, emphasis, lists, images, and links.
Read more >
Documentation Style Guide - GitLab Docs
It also fails when a document has non-standard Markdown (which may render ... listed in the Pajamas Design System Content section and typically...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found