question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add custom features to NER

See original GitHub issue

Description of Problem: Right now custom entities can only use pos features from spacy and a handful of simple features. This seems to be in contrast to the flexibility and power of the other pipeline components which can take advantage of any combination of built-in and custom featurizers. Ideally, there would be a way to pass ner_features to the CRFEntityExtractor. In particular, this would let you train NER that used word/token vectors straight from spacy (or other pretrained models)

Overview of the Solution:

  • CRFEntityExtractor needs to additionally check for ner_features on the message and add them to the feature dict it passes to sklearn_crfsuite.
  • There need to be NER featurizer classes added

Examples (if relevant): The skeleton of this (both adding a spacy-based featurizer and making CRFEntityExtractor use ner_features) is implemented in this PR https://github.com/RasaHQ/rasa/pull/4187 Please let me know if this looks like a useful feature and if this PR is heading in the right direction.

Still necessary:

  • Add tests
  • Extend Featurizer to also have _combine_with_existing_ner_features
  • Validate that having default spacy tokens noticeably improves NER for a sample task
  • Make spacy only optionally add to ner_features
  • Replace the hard-coded lambda functions in CRFEntityExtractor with a simple Featurizer

Definition of Done:

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
jamesmfcommented, Aug 12, 2019

@Zylatis yes, if you wanted to add that as a component compatible with this, I would imagine you’d create something like this:

class KeywordDistanceFeaturizer(Featurizer):
    ...
    def process(self, message):
         # compute an np array of shape (num_tokens, num_features) where each (i, j) index might be the ith token's distance from the jth keyword
         keyword_features = self.get_keyword_distances(message.get("tokens", []))
         self._combine_with_existing_ner_features(message, keyword_features):
0reactions
Zylatiscommented, Aug 13, 2019

Cool, would definitely be handy. I’m definitely keen to do as much custom feature engineering with this CRF as possible, so if you’d like help on this PR let me know (not an expert by any means but i’d like to contribute if i can). @jamesmf

Read more comments on GitHub >

github_iconTop Results From Across the Web

Adding domain knowledge (custom features) to NER
I'm on an Ubuntu machine with Python 3.5.2 and spaCy 2.0. I'm training a blank Spanish model to recognize entities in resumes.
Read more >
Adding domain knowledge (custom features) to NER #1827
Hi, everyone. I'm on an Ubuntu machine with Python 3.5.2 and spaCy 2.0. I'm training a blank Spanish model to recognize entities in...
Read more >
Adding Custom Features to Train a NER spaCy Model
Hi! I have been training a NER model with the spaCy and the results are pretty good :smile: So I really enjoy doing...
Read more >
How to create custom NER model in Spacy | by Nikita sharma
In order to train the model with our annotated data, we need to add the labels (entities) we want to extract from our...
Read more >
How to use additional input features for NER? - Beginners
Hello, I've been following the documentation on fine-tuning custom datasets (https://huggingface.co/transformers/custom_datasets.html), ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found