question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can not config token_pattern for CountVectorsFeaturizer in config.yml?

See original GitHub issue

Rasa version: 1.1.2 Rasa X version (if used & relevant):

Python version: 3.7.3 Operating system (windows, osx, …):

Issue: In the config.yml I want to config token_pattern: r’(?u)\b\w+\b’ for chinese under CountVectorsFeaturizer Component. But it doesn’t work.

CountVectorsFeaturizer reads my configuration r’(?u)\b\w+\b’ as a normal string not a regex. and failed in train method and go into exception branch:

try:
            # noinspection PyPep8Naming
            X = self.vectorizer.fit_transform(lem_exs).toarray()
        except ValueError:
            self.vectorizer = None (will come here)
            return

Error (including full traceback):


Command or request that led to error:


Content of configuration file (config.yml) (if relevant):

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: "zh"
pipeline:
- name: "JiebaTokenizer"
  dictionary_path: "jieba_dict"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "CountVectorsFeaturizer"
  token_pattern: r'(?u)\b\w+\b'
- name: "EmbeddingIntentClassifier"

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: MemoizationPolicy
  - name: KerasPolicy
  - name: MappingPolicy
  - name: "FallbackPolicy"
    nlu_threshold: 0.3
    core_threshold: 0.3
    fallback_action_name: 'action_default_fallback'

Content of domain file (domain.yml) (if relevant):


Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
erohmensingcommented, Aug 14, 2019

Thanks for the help on this @psds01!

0reactions
journey1986commented, Jul 24, 2019

@psds01 After changed token_pattern: “(?u)\b\w+\b” in config.yml. It seems ok now.

Thank you very much:)

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to config token_pattern for CountVectorsFeaturizer in ...
In the config.yml I want to config token_pattern: r'(?u)\b\w+\b' for chinese under CountVectorsFeaturizer Component. But it doesn't work.
Read more >
Configuration Introduction - CircleCI
The config.yml file is located in a folder called .circleci at the top of your repo project. CircleCI uses the YAML syntax for...
Read more >
config.yml - Sonatype Help
The main configuration file for the IQ Server installation is a YAML formatted file called config.yml. By default, config.yml is located in the...
Read more >
Config options - Vikunja
Right now it is not possible to configure openid authentication via environment variables. Variables are nested in the config.yml , these nested variables...
Read more >
config.yml · master · Muriel Figueredo Franco / SecBot · GitLab
config.yml 1.03 KiB. Open in Web IDE Toggle dropdown ... Configuration for Rasa NLU. 5. pipeline: 6. - name: "SpacyNLP" ... name: CountVectorsFeaturizer....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found