Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error Training NLU model with SklearnIntentClassifier

See original GitHub issue

I am getting this error, when I try to train NLU model with SklearnIntentClassifier:

2020-07-20 14:55:36 INFO     rasa.nlu.model  - Finished training component.
2020-07-20 14:55:36 INFO     rasa.nlu.model  - Starting to train component SklearnIntentClassifier
Fitting 2 folds for each of 6 candidates, totalling 12 fits
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done  12 out of  12 | elapsed:    0.0s finished
Traceback (most recent call last):
  File "anaconda3\envs\rasa-venv\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "anaconda3\envs\rasa-venv\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "Anaconda3\envs\rasa-venv\Scripts\rasa.exe\__main__.py", line 7, in <module>
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\__main__.py", line 92, in main
    cmdline_arguments.func(cmdline_arguments)
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\cli\train.py", line 140, in train_nlu
    persist_nlu_training_data=args.persist_nlu_data,
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\train.py", line 414, in train_nlu
    persist_nlu_training_data,
  File "anaconda3\envs\rasa-venv\lib\asyncio\base_events.py", line 587, in run_until_complete
    return future.result()
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\train.py", line 453, in _train_nlu_async
    persist_nlu_training_data=persist_nlu_training_data,
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\train.py", line 482, in _train_nlu_with_validated_data
    persist_nlu_training_data=persist_nlu_training_data,
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\nlu\train.py", line 90, in train
    interpreter = trainer.train(training_data, **kwargs)
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\nlu\model.py", line 191, in train
    updates = component.train(working_data, self.config, **context)
  File "anaconda3\envs\rasa-venv\lib\site-packages\rasa\nlu\classifiers\sklearn_intent_classifier.py", line 125, in train
    self.clf.fit(X, y)
  File "anaconda3\envs\rasa-venv\lib\site-packages\sklearn\model_selection\_search.py", line 739, in fit
    self.best_estimator_.fit(X, y, **fit_params)
  File "anaconda3\envs\rasa-venv\lib\site-packages\sklearn\svm\_base.py", line 148, in fit
    accept_large_sparse=False)
  File "anaconda3\envs\rasa-venv\lib\site-packages\sklearn\utils\validation.py", line 755, in check_X_y
    estimator=estimator)
  File "anaconda3\envs\rasa-venv\lib\site-packages\sklearn\utils\validation.py", line 578, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "anaconda3\envs\rasa-venv\lib\site-packages\sklearn\utils\validation.py", line 60, in _assert_all_finite
    msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

I’m using the latest version of rasa (1.10.8) and scikit-learn 0.22.2.post1 on a virtual environment with Python3.7.

Issue Analytics

State:
Created 3 years ago
Comments:17 (10 by maintainers)

Top GitHub Comments

1reaction

koaningcommented, Aug 26, 2020

I also have another project called Rasa NLU examples. If you’re really looking to try out all the options out there you can also try out FastText and BytePair embeddings. We also feature gensim as a method of training your own word embeddings. In my experience sofar it doesn’t seem that those backends are much better than spaCy but if you’re interested you’re free to try.

One thing, again, getting the assistant in front of actual users is MoreImportant[tm] than what I’ve just suggested.

The CountVectorizer doesn’t just do bag of words. It also does bag of character-ngrams. This is super important in the real where spelling errors might arise (which, happens a lot in chatbot-land).

1reaction

bofenghuangcommented, Aug 26, 2020

@koaning really appreciate your help.

In fact, I’m building the system in French. As for the feauturizer, I’m currently trying :

spaCy (sm / md / lg)
multilingual bert
camembert (according to https://forum.rasa.com/t/rasa-and-camembert/31882)

I’m also using a CountVectorsFeaturizer after the pre-trained models to capture words specific to my corpus cause the pre-trained models are frozen while training the rasa (according to https://blog.rasa.com/how-to-benchmark-bert/). Even though I’m curious about why adding a bag-of-words model after word embeddings works ? Could you pls explain a little about this ?

Thanks for the other advices, they are really inspiring 😃