question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`rasa init` config retrains when comments are removed

See original GitHub issue

I just ran rasa init from the main branch and trained a model using this pipeline:

recipe: default.v1

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

pipeline:
# # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# # If you'd like to customize it, uncomment and adjust the pipeline.
# # See https://rasa.com/docs/rasa/tuning-your-model for more information.
#   - name: WhitespaceTokenizer
#   - name: RegexFeaturizer
#   - name: LexicalSyntacticFeaturizer
#   - name: CountVectorsFeaturizer
#   - name: CountVectorsFeaturizer
#     analyzer: char_wb
#     min_ngram: 1
#     max_ngram: 4
#   - name: DIETClassifier
#     epochs: 100
#     constrain_similarities: true
#   - name: EntitySynonymMapper
#   - name: ResponseSelector
#     epochs: 100
#     constrain_similarities: true
#   - name: FallbackClassifier
#     threshold: 0.3
#     ambiguity_threshold: 0.1

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
# # No configuration for policies was provided. The following default policies were used to train your model.
# # If you'd like to customize them, uncomment and adjust the policies.
# # See https://rasa.com/docs/rasa/policies for more information.
#   - name: MemoizationPolicy
#   - name: RulePolicy
#   - name: UnexpecTEDIntentPolicy
#     max_history: 5
#     epochs: 100
#   - name: TEDPolicy
#     max_history: 5
#     epochs: 100
#     constrain_similarities: true

The model trains, finger prints get create. Everything seems fine. But now I uncomment the lines.

recipe: default.v1

language: en

pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
    constrain_similarities: true
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
    constrain_similarities: true
  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1

policies:
  - name: MemoizationPolicy
  - name: RulePolicy
  - name: UnexpecTEDIntentPolicy
    max_history: 5
    epochs: 100
  - name: TEDPolicy
    max_history: 5
    epochs: 100
    constrain_similarities: true

In this case, I’d expect the fingerprints to kick in and prevent the components from getting re-trained. This is not what happens, the whole pipeline retrains. This suggests there may be something strange happening in our fingerprinting mechanic that’s worth looking into.

A question worth asking: why do we even need these comments in the first place? It feels strange to ask a user to uncomment configuration settings when they’re getting started.

Definition of done

  • Make UX of changing the rasa init config nice with proper caching
  • Fix bug

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
wochingecommented, Oct 22, 2021

You’re completely right. We still prioritized it as it’s right at the start of the Rasa journey (with the new model architecture) and could make a bad impression (especially if we’re advertising that our new architecture fixes exactly that behavior 😆 )

0reactions
koaningcommented, Oct 22, 2021

Now that I pondered it further.

It seems, on the larger scheme of things, this is an inconvenience at worst. It’ll mainly happen when a user is running rasa init and even then it’s a one-time cost of retraining. As long as it’s not related to the comments I’d argue it’s fine to deprioritise.

Read more comments on GitHub >

github_iconTop Results From Across the Web

model is retrained everytime due to dynamic checkpoint names
osx Issue: I trained my model, and running rasa shell everything works fine. But once I start rasa interactive, it starts training a...
Read more >
Some entities in lookup could not be extracted for Chinese ...
But when I remove the nlu.*.tar.gz file from models/ dir and retrain the model with 18 entities. It does not work. So does...
Read more >
Rasa Livestream: So you've run `rasa init`... now what?
Going from the sample moodbot project to your own working prototype assistant can be a little intimidating. In this livestream, Senior ...
Read more >
Rasa Livecoding: Interrupting a Form - YouTube
When we first set up our form, we took in any input the user provided it and used fuzzy matching to find the...
Read more >
THE RASA MASTERCLASS HANDBOOK - HubSpot
You can now retrain the model using the rasa train NLU command. Test the newly trained model by running the Rasa CLI command,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found