Error Training NLU model with SklearnIntentClassifier
See original GitHub issueI am getting this error, when I try to train NLU model with SklearnIntentClassifier:
2020-07-20 14:55:36 INFO rasa.nlu.model - Finished training component.
2020-07-20 14:55:36 INFO rasa.nlu.model - Starting to train component SklearnIntentClassifier
Fitting 2 folds for each of 6 candidates, totalling 12 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 12 out of 12 | elapsed: 0.0s finished
Traceback (most recent call last):
File "anaconda3\envs\rasa-venv\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "anaconda3\envs\rasa-venv\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "Anaconda3\envs\rasa-venv\Scripts\rasa.exe\__main__.py", line 7, in <module>
File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\__main__.py", line 92, in main
cmdline_arguments.func(cmdline_arguments)
File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\cli\train.py", line 140, in train_nlu
persist_nlu_training_data=args.persist_nlu_data,
File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\train.py", line 414, in train_nlu
persist_nlu_training_data,
File "anaconda3\envs\rasa-venv\lib\asyncio\base_events.py", line 587, in run_until_complete
return future.result()
File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\train.py", line 453, in _train_nlu_async
persist_nlu_training_data=persist_nlu_training_data,
File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\train.py", line 482, in _train_nlu_with_validated_data
persist_nlu_training_data=persist_nlu_training_data,
File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\nlu\train.py", line 90, in train
interpreter = trainer.train(training_data, **kwargs)
File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\nlu\model.py", line 191, in train
updates = component.train(working_data, self.config, **context)
File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\nlu\classifiers\sklearn_intent_classifier.py", line 125, in train
self.clf.fit(X, y)
File "anaconda3\envs\rasa-venv\lib\site-packages\sklearn\model_selection\_search.py", line 739, in fit
self.best_estimator_.fit(X, y, **fit_params)
File "anaconda3\envs\rasa-venv\lib\site-packages\sklearn\svm\_base.py", line 148, in fit
accept_large_sparse=False)
File "anaconda3\envs\rasa-venv\lib\site-packages\sklearn\utils\validation.py", line 755, in check_X_y
estimator=estimator)
File "anaconda3\envs\rasa-venv\lib\site-packages\sklearn\utils\validation.py", line 578, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "anaconda3\envs\rasa-venv\lib\site-packages\sklearn\utils\validation.py", line 60, in _assert_all_finite
msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I’m using the latest version of rasa (1.10.8) and scikit-learn 0.22.2.post1 on a virtual environment with Python3.7.
Issue Analytics
- State:
- Created 3 years ago
- Comments:17 (10 by maintainers)
Top Results From Across the Web
Error while training rasa with small english model ...
spacy_utils - Trying to load spacy model with name 'en_core_web_sm' 2021-01-27 14:30:43 INFO rasa.nlu.components - Added 'SpacyNLP' to component ...
Read more >NLU Models throwing error during training - ServiceNow
Suddenly the NLU models are failing to train. We are getting the error message 'Unable to train the model at this time.
Read more >CLI No longer accepts "num_threads" as a parameter ... - GitHub
When passing in the num_threads parameter in the CLI it now returns the following error. ... This should be a valid CLI parameter...
Read more >Adding utterances in the nlu.md file and training a model not ...
I trained a new model through command line, started rasa x and talked to the bot, on entering 2019 the intent identified is...
Read more >THE RASA MASTERCLASS HANDBOOK - HubSpot
To train an NLU model using the supervised_embeddings pipeline, define it in your config.yml file and then run the Rasa CLI command rasa...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I also have another project called Rasa NLU examples. If you’re really looking to try out all the options out there you can also try out FastText and BytePair embeddings. We also feature
gensim
as a method of training your own word embeddings. In my experience sofar it doesn’t seem that those backends are much better than spaCy but if you’re interested you’re free to try.One thing, again, getting the assistant in front of actual users is MoreImportant[tm] than what I’ve just suggested.
The CountVectorizer doesn’t just do bag of words. It also does bag of character-ngrams. This is super important in the real where spelling errors might arise (which, happens a lot in chatbot-land).
@koaning really appreciate your help.
In fact, I’m building the system in French. As for the feauturizer, I’m currently trying :
I’m also using a
CountVectorsFeaturizer
after the pre-trained models to capture words specific to my corpus cause the pre-trained models are frozen while training the rasa (according to https://blog.rasa.com/how-to-benchmark-bert/). Even though I’m curious about why adding a bag-of-words model after word embeddings works ? Could you pls explain a little about this ?Thanks for the other advices, they are really inspiring 😃