Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spacy Featurizer does not set word embeddings as features

See original GitHub issue

Rasa Open Source version

2.7.2

Rasa SDK version

No response

Rasa X version

No response

Python version

3.6

What operating system are you using?

Linux

What happened?

I am using the following pipeline to train an Italian NLU model.

language: “it”

pipeline:

name: SpacyNLP model: “it_core_news_sm”
name: SpacyTokenizer
name: SpacyFeaturizer pooling: “mean”
name: DIETClassifier epochs: 1
name: EntitySynonymMapper
name: ResponseSelector epochs: 100
name: FallbackClassifier threshold: 0.7

When I run rasa train nlu, I obtain the following error in the DIET training step: TFLayerConfigException: The attribute signature must contain some sequence-level feature signatures but none were found.

Going into the code of the spacy_featurizer:

def _set_spacy_features(self, message: Message, attribute: Text = TEXT) -> None: “”“Adds the spacy word vectors to the messages features.”“” doc = self.get_doc(message, attribute)

if doc is None: return

in case an empty spaCy model was used, no vectors are present if doc.vocab.vectors_length == 0: logger.debug(“No features present. You are using an empty spaCy model.”) return

sequence_features = self._features_for_doc(doc) sentence_features = self._calculate_sentence_features( sequence_features, self.pooling_operation )

final_sequence_features = Features( sequence_features, FEATURE_TYPE_SEQUENCE, attribute, self.component_config<span class="error">[FEATURIZER_CLASS_ALIAS]</span>, ) message.add_features(final_sequence_features) final_sentence_features = Features( sentence_features, FEATURE_TYPE_SENTENCE, attribute, self.component_config<span class="error">[FEATURIZER_CLASS_ALIAS]</span>, ) message.add_features(final_sentence_features)

I saw that doc.vocab.vectors_length = 0 even if the spacy model has computed the word embeddings. Therefore, no features are set in the message for the following components.

It sounds like a bug, doesn’t it?

Command / Request

No response

Relevant log output

No response

Definition of done:

Investigate where the bug is coming from (involves looking into Spacy docs, e.g. doc.vocab and doc.tensor)
Write tests that catch the bug
Implement a fix, likely by adopting or extending the proposed one
Get the fix merged into 2.8.x

Issue Analytics

State:
Created 2 years ago
Comments:12 (7 by maintainers)

Top GitHub Comments

1reaction

samsucikcommented, Mar 17, 2022

Exalate commented:

samsucik commented:

Alright, I’ve been able to reproduce this very easily – by creating a default project (rasa init) and replacing the config with the one provided above.

0reactions

sync-by-unito[bot]commented, Dec 19, 2022

➤ Maxime Verger commented:

💡 Heads up! We’re moving issues to Jira: https://rasa-open-source.atlassian.net/browse/OSS.

From now on, this Jira board is the place where you can browse (without an account) and create issues (you’ll need a free Jira account for that). This GitHub issue has already been migrated to Jira and will be closed on January 9th, 2023. Do not forget to subscribe to the corresponding Jira issue!

➡️ More information in the forum: https://forum.rasa.com/t/migration-of-rasa-oss-issues-to-jira/56569.