Spacy Featurizer does not set word embeddings as features
See original GitHub issueRasa Open Source version
2.7.2
Rasa SDK version
No response
Rasa X version
No response
Python version
3.6
What operating system are you using?
Linux
What happened?
I am using the following pipeline to train an Italian NLU model.
language: “it”
pipeline:
- name: SpacyNLP model: “it_core_news_sm”
- name: SpacyTokenizer
- name: SpacyFeaturizer pooling: “mean”
- name: DIETClassifier epochs: 1
- name: EntitySynonymMapper
- name: ResponseSelector epochs: 100
- name: FallbackClassifier threshold: 0.7
When I run rasa train nlu
, I obtain the following error in the DIET training step: TFLayerConfigException: The attribute signature must contain some sequence-level feature signatures but none were found.
Going into the code of the spacy_featurizer:
def _set_spacy_features(self, message: Message, attribute: Text = TEXT) -> None: “”“Adds the spacy word vectors to the messages features.”“” doc = self.get_doc(message, attribute)
if doc is None: return
- in case an empty spaCy model was used, no vectors are present if doc.vocab.vectors_length == 0: logger.debug(“No features present. You are using an empty spaCy model.”) return
sequence_features = self._features_for_doc(doc) sentence_features = self._calculate_sentence_features( sequence_features, self.pooling_operation )
final_sequence_features = Features( sequence_features, FEATURE_TYPE_SEQUENCE, attribute, self.component_config<span class="error">[FEATURIZER_CLASS_ALIAS]</span>, ) message.add_features(final_sequence_features) final_sentence_features = Features( sentence_features, FEATURE_TYPE_SENTENCE, attribute, self.component_config<span class="error">[FEATURIZER_CLASS_ALIAS]</span>, ) message.add_features(final_sentence_features)
I saw that doc.vocab.vectors_length = 0
even if the spacy model has computed the word embeddings. Therefore, no features are set in the message for the following components.
It sounds like a bug, doesn’t it?
Command / Request
No response
Relevant log output
No response
Definition of done:
- Investigate where the bug is coming from (involves looking into Spacy
doc
s, e.g.doc.vocab
anddoc.tensor
) - Write tests that catch the bug
- Implement a fix, likely by adopting or extending the proposed one
- Get the fix merged into
2.8.x
Issue Analytics
- State:
- Created 2 years ago
- Comments:12 (7 by maintainers)
Exalate commented:
samsucik commented:
Alright, I’ve been able to reproduce this very easily – by creating a default project (
rasa init
) and replacing the config with the one provided above.➤ Maxime Verger commented:
💡 Heads up! We’re moving issues to Jira: https://rasa-open-source.atlassian.net/browse/OSS.
From now on, this Jira board is the place where you can browse (without an account) and create issues (you’ll need a free Jira account for that). This GitHub issue has already been migrated to Jira and will be closed on January 9th, 2023. Do not forget to subscribe to the corresponding Jira issue!
➡️ More information in the forum: https://forum.rasa.com/t/migration-of-rasa-oss-issues-to-jira/56569.