Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

A FAIRSEQ text summarization example: the abstractive approach with a Levenshtein transformer

See original GitHub issue

The Levenshtein transformer paper reports 0.75+ improvements in ROUGE-L in the abstractive text summarization task on Gigaword over the baseline transformer.

benchmarks

The team of @fvadzim, @whiteRa2bit, @NickShatalov and I would love to reproduce the result as part of the intensive practicum organized by Yandex (here is the description in Russian) and continue working on the PR after the event ends on November 16, trying the model out on the Russian news dataset and contributing the docs that explain the training procedure to FAIRSEQ.

Proposal

Here is the plan of what we would love to contribute:

Creating a new page on text summarization in examples

The first sentence in README mentions summarization amongst others, but there is no complete description of how to achieve this, despite the fact that both the Levenshtein transformer implementation and pay_less_attention_paper seem to have almost all of the necessary code to make it work.
Making a new task for training the Levenshtein transformer for abstractive text summarization

The end goal would be to train the model on both English and Russian datasets.

Questions

Could you please tell me whether there are any apparent roadblocks in the code itself you can see already that can prevent this plan from succeeding?
The paper uses a Transformer base as a teacher to obtain ROUGE-L of 33.81. The current implementation of NAT NMT also takes the teacher instead of oracle approach as well, so this should help us in setting the training up. Another training scheme that @justheuristic has mentioned in private communication is the one similar to the NMT refinement method introduced by @lena-voita, @rsennrich and @anvdev in this paper: the idea is to produce an extractive summary first, and then refine it with Leveshtein. Have you tested this idea? Sounds nice to include this variation in the comparison as well.
Seems that the current implementation is under active development at the moment, given a number of issues on SIGSEV in the multi-GPU environment:
Are there any precautions on which commit of the repo to use in order to avoid these issues? Is the fix/major update coming soon?

Issue Analytics

State:
Created 4 years ago
Reactions:4
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

aptlincommented, Nov 11, 2019

The v2 of the paper is out. The base transformer performs better than expected, but Levenshtein still beats the base in speed, and provides comparable results for summarization:

IMAGE 2019-11-11 11:23:37

0reactions

aptlincommented, Jun 25, 2020

No, sorry, I do not have the bandwidth now to brush up our results, but you can take a look here for the training scripts and here for fairseq with the comet.ml support.

Top Results From Across the Web

A FAIRSEQ text summarization example: the abstractive ...

The Levenshtein transformer paper reports 0.75+ improvements in ROUGE-L in the abstractive text summarization task on Gigaword over the ...

Text Summarization | Papers With Code

We introduce extreme summarization, a new single-document summarization task which does not favor extractive strategies and calls for an abstractive modeling ...

Levenshtein Transformer - DeepAI

This model achieves comparable or better results than a strong Transformer baseline in both machine translation and text summarization, ...

Levenshtein Transformer - arXiv

We propose Levenshtein Transformer (LevT), a new sequence generation model composed of the insertion and deletion operations. This model ...

Abstractive Text Summarization Using Transformers - Medium

This is a more human-like way of generating summaries and these summaries are more effective as compared to the extractive approaches. However, ...