Add custom features to NER
See original GitHub issueDescription of Problem:
Right now custom entities can only use pos
features from spacy
and a handful of simple features. This seems to be in contrast to the flexibility and power of the other pipeline components which can take advantage of any combination of built-in and custom featurizers
. Ideally, there would be a way to pass ner_features
to the CRFEntityExtractor
. In particular, this would let you train NER that used word/token vectors straight from spacy (or other pretrained models)
Overview of the Solution:
CRFEntityExtractor
needs to additionally check forner_features
on the message and add them to the feature dict it passes tosklearn_crfsuite
.- There need to be NER featurizer classes added
Examples (if relevant):
The skeleton of this (both adding a spacy
-based featurizer and making CRFEntityExtractor
use ner_features
) is implemented in this PR
https://github.com/RasaHQ/rasa/pull/4187
Please let me know if this looks like a useful feature and if this PR is heading in the right direction.
Still necessary:
- Add tests
- Extend
Featurizer
to also have_combine_with_existing_ner_features
- Validate that having default spacy tokens noticeably improves NER for a sample task
- Make
spacy
only optionally add toner_features
- Replace the hard-coded lambda functions in
CRFEntityExtractor
with a simpleFeaturizer
Definition of Done:
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:6 (3 by maintainers)
Top GitHub Comments
@Zylatis yes, if you wanted to add that as a component compatible with this, I would imagine you’d create something like this:
Cool, would definitely be handy. I’m definitely keen to do as much custom feature engineering with this CRF as possible, so if you’d like help on this PR let me know (not an expert by any means but i’d like to contribute if i can). @jamesmf