Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`rasa init` config retrains when comments are removed

See original GitHub issue

I just ran rasa init from the main branch and trained a model using this pipeline:

recipe: default.v1

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

pipeline:
# # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# # If you'd like to customize it, uncomment and adjust the pipeline.
# # See https://rasa.com/docs/rasa/tuning-your-model for more information.
#   - name: WhitespaceTokenizer
#   - name: RegexFeaturizer
#   - name: LexicalSyntacticFeaturizer
#   - name: CountVectorsFeaturizer
#   - name: CountVectorsFeaturizer
#     analyzer: char_wb
#     min_ngram: 1
#     max_ngram: 4
#   - name: DIETClassifier
#     epochs: 100
#     constrain_similarities: true
#   - name: EntitySynonymMapper
#   - name: ResponseSelector
#     epochs: 100
#     constrain_similarities: true
#   - name: FallbackClassifier
#     threshold: 0.3
#     ambiguity_threshold: 0.1

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
# # No configuration for policies was provided. The following default policies were used to train your model.
# # If you'd like to customize them, uncomment and adjust the policies.
# # See https://rasa.com/docs/rasa/policies for more information.
#   - name: MemoizationPolicy
#   - name: RulePolicy
#   - name: UnexpecTEDIntentPolicy
#     max_history: 5
#     epochs: 100
#   - name: TEDPolicy
#     max_history: 5
#     epochs: 100
#     constrain_similarities: true

The model trains, finger prints get create. Everything seems fine. But now I uncomment the lines.

recipe: default.v1

language: en

pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
    constrain_similarities: true
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
    constrain_similarities: true
  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1

policies:
  - name: MemoizationPolicy
  - name: RulePolicy
  - name: UnexpecTEDIntentPolicy
    max_history: 5
    epochs: 100
  - name: TEDPolicy
    max_history: 5
    epochs: 100
    constrain_similarities: true

In this case, I’d expect the fingerprints to kick in and prevent the components from getting re-trained. This is not what happens, the whole pipeline retrains. This suggests there may be something strange happening in our fingerprinting mechanic that’s worth looking into.

A question worth asking: why do we even need these comments in the first place? It feels strange to ask a user to uncomment configuration settings when they’re getting started.

Definition of done

Make UX of changing the rasa init config nice with proper caching
Fix bug

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

wochingecommented, Oct 22, 2021

You’re completely right. We still prioritized it as it’s right at the start of the Rasa journey (with the new model architecture) and could make a bad impression (especially if we’re advertising that our new architecture fixes exactly that behavior 😆 )

0reactions

koaningcommented, Oct 22, 2021

Now that I pondered it further.

It seems, on the larger scheme of things, this is an inconvenience at worst. It’ll mainly happen when a user is running rasa init and even then it’s a one-time cost of retraining. As long as it’s not related to the comments I’d argue it’s fine to deprioritise.