Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add a new pipeline for the Relation Extraction task.

See original GitHub issue

🚀 Feature request

Add a new pipeline option for the Relation Extraction task : nlp = pipeline('relation-extraction')

Motivation

Relation Extraction between named entities is a well-known NLP task. For example, when you get entities relative to medications (let’s say our entity types are DRUG and FORM (tablet, capsule, etc.)), you want to know which FORM entity goes with which DRUG entity, etc.

Reference: https://portal.dbmi.hms.harvard.edu/projects/n2c2-2018-t2/ This task is not limited to the biomedical domain.

Your contribution

I still need to play more with the HF API to contribute !

But, as I see it, the pipeline would return a list of dictionaries, each dictionary representing an identified relation in the text.

The relation extraction model would probably sit on top of the NER model.

There are implementations of such models here.

Issue Analytics

State:
Created 2 years ago
Comments:13 (6 by maintainers)

Top GitHub Comments

3reactions

NielsRoggecommented, May 26, 2021

For now, there’s only 1 model that is capable of performing relation extraction out-of-the-box, and that’s LUKE. You can use LukeForEntityPairClassification to classify the relationship between two entities in a sentence:

from transformers import LukeTokenizer, LukeForEntityPairClassification

tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
model = LukeForEntityPairClassification.from_pretrained("studio-ousia/luke-large-finetuned-tacred")

text = "Beyoncé lives in Los Angeles."
entity_spans = [(0, 7), (17, 28)]  # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
inputs = tokenizer(text, entity_spans=entity_spans, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

However, relation extraction is a task that is solved in many different ways. So it’s not straightforward to define a generic pipeline for it, in which you can plug different models.

2reactions

NielsRoggecommented, Aug 17, 2021

How should i go about incorporating this in my dataset?

I think you need to create several training examples for this single sentence. Each training example should be <sentence, entity 1, entity 2, relationship>. So indeed, option 1 is what I would do.

There are other approaches to relation extraction, in which one applies a binary classifier to each possible pair of entities (an example is this paper). However, LUKE doesn’t work that way.

Top Results From Across the Web

SPACY v3: Custom trainable relation extraction component

spaCy v3.0 features new transformer-based pipelines that get spaCy's accuracy right up to the current state-of-the-art, and a new training ...

From Text to Knowledge: The Information Extraction Pipeline

My implementation of the information extraction pipeline consists of four parts. In the first step, we run the input text through a coreference ......

Relation Extraction - Papers With Code

Relation Extraction is the task of predicting attributes and relations for entities in a sentence. For example, given a sentence “Barack Obama was...

Pipeline Approach in End-to-End Relation Extraction.

This paper proposes a novel context-aware joint entity and word-level relation extraction approach through semantic composition of words, introducing a ...

RAPS: A Novel Few-Shot Relation Extraction Pipeline ... - arXiv

To generalize to new relations more effectively, this paper proposes a novel pipeline for the FSRE task based on queRy-information guided ...