question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add a new pipeline for the Relation Extraction task.

See original GitHub issue

🚀 Feature request

Add a new pipeline option for the Relation Extraction task : nlp = pipeline('relation-extraction')

Motivation

Relation Extraction between named entities is a well-known NLP task. For example, when you get entities relative to medications (let’s say our entity types are DRUG and FORM (tablet, capsule, etc.)), you want to know which FORM entity goes with which DRUG entity, etc.

Reference: https://portal.dbmi.hms.harvard.edu/projects/n2c2-2018-t2/ This task is not limited to the biomedical domain.

Your contribution

I still need to play more with the HF API to contribute !

But, as I see it, the pipeline would return a list of dictionaries, each dictionary representing an identified relation in the text.

The relation extraction model would probably sit on top of the NER model.

There are implementations of such models here.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:13 (6 by maintainers)

github_iconTop GitHub Comments

3reactions
NielsRoggecommented, May 26, 2021

For now, there’s only 1 model that is capable of performing relation extraction out-of-the-box, and that’s LUKE. You can use LukeForEntityPairClassification to classify the relationship between two entities in a sentence:

from transformers import LukeTokenizer, LukeForEntityPairClassification

tokenizer = LukeTokenizer.from_pretrained("studio-ousia/luke-large-finetuned-tacred")
model = LukeForEntityPairClassification.from_pretrained("studio-ousia/luke-large-finetuned-tacred")

text = "Beyoncé lives in Los Angeles."
entity_spans = [(0, 7), (17, 28)]  # character-based entity spans corresponding to "Beyoncé" and "Los Angeles"
inputs = tokenizer(text, entity_spans=entity_spans, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
predicted_class_idx = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_idx])

However, relation extraction is a task that is solved in many different ways. So it’s not straightforward to define a generic pipeline for it, in which you can plug different models.

2reactions
NielsRoggecommented, Aug 17, 2021

How should i go about incorporating this in my dataset?

I think you need to create several training examples for this single sentence. Each training example should be <sentence, entity 1, entity 2, relationship>. So indeed, option 1 is what I would do.

There are other approaches to relation extraction, in which one applies a binary classifier to each possible pair of entities (an example is this paper). However, LUKE doesn’t work that way.

Read more comments on GitHub >

github_iconTop Results From Across the Web

SPACY v3: Custom trainable relation extraction component
spaCy v3.0 features new transformer-based pipelines that get spaCy's accuracy right up to the current state-of-the-art, and a new training ...
Read more >
From Text to Knowledge: The Information Extraction Pipeline
My implementation of the information extraction pipeline consists of four parts. In the first step, we run the input text through a coreference ......
Read more >
Relation Extraction - Papers With Code
Relation Extraction is the task of predicting attributes and relations for entities in a sentence. For example, given a sentence “Barack Obama was...
Read more >
Pipeline Approach in End-to-End Relation Extraction.
This paper proposes a novel context-aware joint entity and word-level relation extraction approach through semantic composition of words, introducing a ...
Read more >
RAPS: A Novel Few-Shot Relation Extraction Pipeline ... - arXiv
To generalize to new relations more effectively, this paper proposes a novel pipeline for the FSRE task based on queRy-information guided ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found