question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spacy Featurizer does not set word embeddings as features

See original GitHub issue

Rasa Open Source version

2.7.2

Rasa SDK version

No response

Rasa X version

No response

Python version

3.6

What operating system are you using?

Linux

What happened?

I am using the following pipeline to train an Italian NLU model.

language: “it”

pipeline:

  • name: SpacyNLP model: “it_core_news_sm”
  • name: SpacyTokenizer
  • name: SpacyFeaturizer pooling: “mean”
  • name: DIETClassifier epochs: 1
  • name: EntitySynonymMapper
  • name: ResponseSelector epochs: 100
  • name: FallbackClassifier threshold: 0.7

When I run rasa train nlu, I obtain the following error in the DIET training step: TFLayerConfigException: The attribute signature must contain some sequence-level feature signatures but none were found.

Going into the code of the spacy_featurizer:

def _set_spacy_features(self, message: Message, attribute: Text = TEXT) -> None: “”“Adds the spacy word vectors to the messages features.”“” doc = self.get_doc(message, attribute)

if doc is None: return

  1. in case an empty spaCy model was used, no vectors are present if doc.vocab.vectors_length == 0: logger.debug(“No features present. You are using an empty spaCy model.”) return

sequence_features = self._features_for_doc(doc) sentence_features = self._calculate_sentence_features( sequence_features, self.pooling_operation )

final_sequence_features = Features( sequence_features, FEATURE_TYPE_SEQUENCE, attribute, self.component_config<span class="error">[FEATURIZER_CLASS_ALIAS]</span>, ) message.add_features(final_sequence_features) final_sentence_features = Features( sentence_features, FEATURE_TYPE_SENTENCE, attribute, self.component_config<span class="error">[FEATURIZER_CLASS_ALIAS]</span>, ) message.add_features(final_sentence_features)

I saw that doc.vocab.vectors_length = 0 even if the spacy model has computed the word embeddings. Therefore, no features are set in the message for the following components.

It sounds like a bug, doesn’t it?

Command / Request

No response

Relevant log output

No response

Definition of done:

  • Investigate where the bug is coming from (involves looking into Spacy docs, e.g. doc.vocab and doc.tensor)
  • Write tests that catch the bug
  • Implement a fix, likely by adopting or extending the proposed one
  • Get the fix merged into 2.8.x

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:12 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
samsucikcommented, Mar 17, 2022

Exalate commented:

samsucik commented:

Alright, I’ve been able to reproduce this very easily – by creating a default project (rasa init) and replacing the config with the one provided above.

0reactions
sync-by-unito[bot]commented, Dec 19, 2022

➤ Maxime Verger commented:

💡 Heads up! We’re moving issues to Jira: https://rasa-open-source.atlassian.net/browse/OSS.

From now on, this Jira board is the place where you can browse (without an account) and create issues (you’ll need a free Jira account for that). This GitHub issue has already been migrated to Jira and will be closed on January 9th, 2023. Do not forget to subscribe to the corresponding Jira issue!

➡️ More information in the forum: https://forum.rasa.com/t/migration-of-rasa-oss-issues-to-jira/56569.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Embeddings, Transformers and Transfer Learning - spaCy
In spaCy, word vector tables are only used as static features. spaCy does not backpropagate gradients to the pretrained word vectors table. The...
Read more >
Clarification regarding NLU Pipeline and DIETClassifier
In the documentation is also stated, that the SpacyFeaturizer provides pre-trained word embeddings from GloVe or fastText…how do I know which is ...
Read more >
How to turn Text into Features - Towards Data Science
For those of you that are not used to this word, let me digress a ... text into vectors would be to create...
Read more >
6.864 PSET1 - OpenReview
As the size of word embeddings increases, the accuracy of the lsa featurizer also increases because the higher latent space is able to...
Read more >
Featurization with automated machine learning - Azure ...
One-hot-hash encoding is used for high-cardinality categorical features. Word embeddings, A text featurizer converts vectors of text tokens into ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found