Add custom features to NER
See original GitHub issueDescription of Problem:
Right now custom entities can only use pos features from spacy and a handful of simple features. This seems to be in contrast to the flexibility and power of the other pipeline components which can take advantage of any combination of built-in and custom featurizers. Ideally, there would be a way to pass ner_features to the CRFEntityExtractor. In particular, this would let you train NER that used word/token vectors straight from spacy (or other pretrained models)
Overview of the Solution:
CRFEntityExtractorneeds to additionally check forner_featureson the message and add them to the feature dict it passes tosklearn_crfsuite.- There need to be NER featurizer classes added
Examples (if relevant):
The skeleton of this (both adding a spacy-based featurizer and making CRFEntityExtractor use ner_features) is implemented in this PR
https://github.com/RasaHQ/rasa/pull/4187
Please let me know if this looks like a useful feature and if this PR is heading in the right direction.
Still necessary:
- Add tests
- Extend
Featurizerto also have_combine_with_existing_ner_features - Validate that having default spacy tokens noticeably improves NER for a sample task
- Make
spacyonly optionally add toner_features - Replace the hard-coded lambda functions in
CRFEntityExtractorwith a simpleFeaturizer
Definition of Done:
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:6 (3 by maintainers)

Top Related StackOverflow Question
@Zylatis yes, if you wanted to add that as a component compatible with this, I would imagine you’d create something like this:
Cool, would definitely be handy. I’m definitely keen to do as much custom feature engineering with this CRF as possible, so if you’d like help on this PR let me know (not an expert by any means but i’d like to contribute if i can). @jamesmf