question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error Training NLU model with SklearnIntentClassifier

See original GitHub issue

I am getting this error, when I try to train NLU model with SklearnIntentClassifier:

2020-07-20 14:55:36 INFO     rasa.nlu.model  - Finished training component.
2020-07-20 14:55:36 INFO     rasa.nlu.model  - Starting to train component SklearnIntentClassifier
Fitting 2 folds for each of 6 candidates, totalling 12 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  12 out of  12 | elapsed:    0.0s finished
Traceback (most recent call last):
  File "anaconda3\envs\rasa-venv\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "anaconda3\envs\rasa-venv\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "Anaconda3\envs\rasa-venv\Scripts\rasa.exe\__main__.py", line 7, in <module>
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\__main__.py", line 92, in main
    cmdline_arguments.func(cmdline_arguments)
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\cli\train.py", line 140, in train_nlu
    persist_nlu_training_data=args.persist_nlu_data,
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\train.py", line 414, in train_nlu
    persist_nlu_training_data,
  File "anaconda3\envs\rasa-venv\lib\asyncio\base_events.py", line 587, in run_until_complete
    return future.result()
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\train.py", line 453, in _train_nlu_async
    persist_nlu_training_data=persist_nlu_training_data,
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\train.py", line 482, in _train_nlu_with_validated_data
    persist_nlu_training_data=persist_nlu_training_data,
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\nlu\train.py", line 90, in train
    interpreter = trainer.train(training_data, **kwargs)
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\nlu\model.py", line 191, in train
    updates = component.train(working_data, self.config, **context)
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\nlu\classifiers\sklearn_intent_classifier.py", line 125, in train
    self.clf.fit(X, y)
  File "anaconda3\envs\rasa-venv\lib\site-packages\sklearn\model_selection\_search.py", line 739, in fit
    self.best_estimator_.fit(X, y, **fit_params)
  File "anaconda3\envs\rasa-venv\lib\site-packages\sklearn\svm\_base.py", line 148, in fit
    accept_large_sparse=False)
  File "anaconda3\envs\rasa-venv\lib\site-packages\sklearn\utils\validation.py", line 755, in check_X_y
    estimator=estimator)
  File "anaconda3\envs\rasa-venv\lib\site-packages\sklearn\utils\validation.py", line 578, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "anaconda3\envs\rasa-venv\lib\site-packages\sklearn\utils\validation.py", line 60, in _assert_all_finite
    msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I’m using the latest version of rasa (1.10.8) and scikit-learn 0.22.2.post1 on a virtual environment with Python3.7.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:17 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
koaningcommented, Aug 26, 2020

I also have another project called Rasa NLU examples. If you’re really looking to try out all the options out there you can also try out FastText and BytePair embeddings. We also feature gensim as a method of training your own word embeddings. In my experience sofar it doesn’t seem that those backends are much better than spaCy but if you’re interested you’re free to try.

One thing, again, getting the assistant in front of actual users is MoreImportant[tm] than what I’ve just suggested.

The CountVectorizer doesn’t just do bag of words. It also does bag of character-ngrams. This is super important in the real where spelling errors might arise (which, happens a lot in chatbot-land).

1reaction
bofenghuangcommented, Aug 26, 2020

@koaning really appreciate your help.

In fact, I’m building the system in French. As for the feauturizer, I’m currently trying :

I’m also using a CountVectorsFeaturizer after the pre-trained models to capture words specific to my corpus cause the pre-trained models are frozen while training the rasa (according to https://blog.rasa.com/how-to-benchmark-bert/). Even though I’m curious about why adding a bag-of-words model after word embeddings works ? Could you pls explain a little about this ?

Thanks for the other advices, they are really inspiring 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error while training rasa with small english model ...
spacy_utils - Trying to load spacy model with name 'en_core_web_sm' 2021-01-27 14:30:43 INFO rasa.nlu.components - Added 'SpacyNLP' to component ...
Read more >
NLU Models throwing error during training - ServiceNow
Suddenly the NLU models are failing to train. We are getting the error message 'Unable to train the model at this time.
Read more >
CLI No longer accepts "num_threads" as a parameter ... - GitHub
When passing in the num_threads parameter in the CLI it now returns the following error. ... This should be a valid CLI parameter...
Read more >
Adding utterances in the nlu.md file and training a model not ...
I trained a new model through command line, started rasa x and talked to the bot, on entering 2019 the intent identified is...
Read more >
THE RASA MASTERCLASS HANDBOOK - HubSpot
To train an NLU model using the supervised_embeddings pipeline, define it in your config.yml file and then run the Rasa CLI command rasa...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found