Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add case_sensitive option to WhitespaceTokenizer

See original GitHub issue

Description of Problem: WhitespaceTokenizer does not have the case_sensitive option. This is in reference to a discussion on the Rasa Community Forum https://forum.rasa.com/t/case-sensitivity/2541/8 with @Ghostvv

Overview of the Solution: Add the case_sensitive option to WhitespaceTokenizer to allow models using this tokenizer to be case insensitive.

Examples (if relevant): If a user types “Burger” when the WhitespaceTokenizer case_sensitive: false, the slot should fill regardless even if training only contains “burger”.

Blockers (if relevant): Not sure.

Definition of Done:

Add the case_sensitive option to the WhitespaceTokenizer
Test a model and ensure that when this option is specified in the pipeline configuration, then a slot is filled regardless of case (so it should be case insensitive).

Issue Analytics

State:
Created 4 years ago
Comments:12 (9 by maintainers)

Top GitHub Comments

5reactions

ccelottocommented, Jul 21, 2019

Thank you everyone! I have tested this functionality and it is working as intended.

1reaction

sibbsnbcommented, Jul 17, 2019

Closed the PR comments

Top Results From Across the Web

Whitespace tokenizer doesn't allow lowercase search?

Hello, I want to use the whitespace tokenizer and be able to have my search be case insensitive. However, I am unable to...

Case insensitive whitespace tokenizer - Rasa Open Source

Hi, I found several forum posts advising people to use the following for implementing a case insensitive pipeline.

How can I solr case insensitive search for a Text data

Try setting minGramSize=2 in attributes of EdgeNGramFilterFactory , remove asterisk from the query: solrParams.add("q","category:te"); , restart Solr and ...

Tokenizers | Apache Solr Reference Guide 6.6

Arguments may be passed to tokenizer factories by setting attributes on the ... or "Part Number", case sensitive, with an optional semi-colon separator....

Add custom analyzers to string fields - Azure Cognitive Search

For example, use the Whitespace tokenizer to break sentences into ... For token filters that have options, add a "tokenFilter" section to ...