SpaCy's `EntityRuler`-like functionality in Spark NLP
See original GitHub issueIs your feature request related to a problem? Please describe. I would like to annotate documents against a Knowledge Base, like spaCy’s EntityRuler does.
cf https://spacy.io/api/entityruler
Describe the solution you’d like
This would be an annotator called EntityRuler
, that would be instantiated by loading a local ontology in, let’s say, JSON or CSV format.
Describe alternatives you’ve considered Using spaCy for that, but on Big Data I fear it won’t scale.
Additional context No additional context.
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (11 by maintainers)
Top Results From Across the Web
com.johnsnowlabs.nlp.annotators.er.EntityRulerApproach
Fits an Annotator to match exact strings or regex patterns provided in a file against a Document and assigns them an named entity....
Read more >EntityRuler · spaCy API Documentation
The entity ruler lets you add spans to the Doc. ents using token-based rules or exact phrase matches. It can be combined with...
Read more >SpaCy or Spark NLP — A Benchmarking Comparison - Medium
The aim of this article is to run a realistic Natural Language Processing scenario to compare the leading linguistic programming libraries: ...
Read more >Rule Based and Pattern Matching for Entity Recognition in ...
Try Spark NLP here: https://www.johnsnowlabs.com/ spark - nlp / Finding patterns and matching strategies are well-known NLP procedures to ...
Read more >Using RegEx for phrase pattern in EntityRuler - Stack Overflow
I tried to find FRT entity with EntityRuler like this: from spacy.lang.en import English from spacy.pipeline import EntityRuler nlp ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @danilojsl, much thanks for the speed, the nice work and the notebook!
I’ve been through the notebook, and something I think could be enhanced is the schema of your JSON/CSV. In addition to
label
andpattern
keys, it would be great to have anID
key (that would reference the entity uniquely), and the ability to have any other key/metadata related to the entity.Moreover, I think taking into account JSONL format in addition to pure JSON could be interesting for speed (SpaCy people recommend that).
An error I had in the notebook:
Thanks again!
Hi @xegulon, I’m working on
EntityRuler
annotator and I have an alpha version of it. If you want to check this out, please see this Colab notebook . We will truly appreciate any feedback on this new feature.