Add a new pipeline for the Relation Extraction task.
See original GitHub issue🚀 Feature request
Add a new pipeline option for the Relation Extraction task : nlp = pipeline('relation-extraction')
Motivation
Relation Extraction between named entities is a well-known NLP task. For example, when you get entities relative to medications (let’s say our entity types are DRUG and FORM (tablet, capsule, etc.)), you want to know which FORM entity goes with which DRUG entity, etc.
Reference: https://portal.dbmi.hms.harvard.edu/projects/n2c2-2018-t2/ This task is not limited to the biomedical domain.
Your contribution
I still need to play more with the HF API to contribute !
But, as I see it, the pipeline would return a list of dictionaries, each dictionary representing an identified relation in the text.
The relation extraction model would probably sit on top of the NER model.
There are implementations of such models here.
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (6 by maintainers)
Top GitHub Comments
For now, there’s only 1 model that is capable of performing relation extraction out-of-the-box, and that’s LUKE. You can use
LukeForEntityPairClassification
to classify the relationship between two entities in a sentence:However, relation extraction is a task that is solved in many different ways. So it’s not straightforward to define a generic pipeline for it, in which you can plug different models.
I think you need to create several training examples for this single sentence. Each training example should be <sentence, entity 1, entity 2, relationship>. So indeed, option 1 is what I would do.
There are other approaches to relation extraction, in which one applies a binary classifier to each possible pair of entities (an example is this paper). However, LUKE doesn’t work that way.