question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

SpaCy's `EntityRuler`-like functionality in Spark NLP

See original GitHub issue

Is your feature request related to a problem? Please describe. I would like to annotate documents against a Knowledge Base, like spaCy’s EntityRuler does.

cf https://spacy.io/api/entityruler

Describe the solution you’d like This would be an annotator called EntityRuler, that would be instantiated by loading a local ontology in, let’s say, JSON or CSV format.

Describe alternatives you’ve considered Using spaCy for that, but on Big Data I fear it won’t scale.

Additional context No additional context.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

3reactions
xeguloncommented, Sep 30, 2021

Hi @danilojsl, much thanks for the speed, the nice work and the notebook!

I’ve been through the notebook, and something I think could be enhanced is the schema of your JSON/CSV. In addition to label and pattern keys, it would be great to have an ID key (that would reference the entity uniquely), and the ability to have any other key/metadata related to the entity.

Moreover, I think taking into account JSONL format in addition to pure JSON could be interesting for speed (SpaCy people recommend that).

An error I had in the notebook:

image

Thanks again!

3reactions
danilojslcommented, Sep 29, 2021

Hi @xegulon, I’m working on EntityRuler annotator and I have an alpha version of it. If you want to check this out, please see this Colab notebook . We will truly appreciate any feedback on this new feature.

Read more comments on GitHub >

github_iconTop Results From Across the Web

com.johnsnowlabs.nlp.annotators.er.EntityRulerApproach
Fits an Annotator to match exact strings or regex patterns provided in a file against a Document and assigns them an named entity....
Read more >
EntityRuler · spaCy API Documentation
The entity ruler lets you add spans to the Doc. ents using token-based rules or exact phrase matches. It can be combined with...
Read more >
SpaCy or Spark NLP — A Benchmarking Comparison - Medium
The aim of this article is to run a realistic Natural Language Processing scenario to compare the leading linguistic programming libraries: ...
Read more >
Rule Based and Pattern Matching for Entity Recognition in ...
Try Spark NLP here: https://www.johnsnowlabs.com/ spark - nlp / Finding patterns and matching strategies are well-known NLP procedures to ...
Read more >
Using RegEx for phrase pattern in EntityRuler - Stack Overflow
I tried to find FRT entity with EntityRuler like this: from spacy.lang.en import English from spacy.pipeline import EntityRuler nlp ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found