A FAIRSEQ text summarization example: the abstractive approach with a Levenshtein transformer
See original GitHub issueThe Levenshtein transformer paper reports 0.75+ improvements in ROUGE-L in the abstractive text summarization task on Gigaword over the baseline transformer.
The team of @fvadzim, @whiteRa2bit, @NickShatalov and I would love to reproduce the result as part of the intensive practicum organized by Yandex (here is the description in Russian) and continue working on the PR after the event ends on November 16, trying the model out on the Russian news dataset and contributing the docs that explain the training procedure to FAIRSEQ.
Proposal
Here is the plan of what we would love to contribute:
-
Creating a new page on text summarization in examples
The first sentence in README mentions summarization amongst others, but there is no complete description of how to achieve this, despite the fact that both the Levenshtein transformer implementation and
pay_less_attention_paper
seem to have almost all of the necessary code to make it work. -
Making a new task for training the Levenshtein transformer for abstractive text summarization
The end goal would be to train the model on both English and Russian datasets.
Questions
-
Could you please tell me whether there are any apparent roadblocks in the code itself you can see already that can prevent this plan from succeeding?
-
The paper uses a Transformer base as a teacher to obtain ROUGE-L of 33.81. The current implementation of NAT NMT also takes the teacher instead of oracle approach as well, so this should help us in setting the training up. Another training scheme that @justheuristic has mentioned in private communication is the one similar to the NMT refinement method introduced by @lena-voita, @rsennrich and @anvdev in this paper: the idea is to produce an extractive summary first, and then refine it with Leveshtein. Have you tested this idea? Sounds nice to include this variation in the comparison as well.
-
Seems that the current implementation is under active development at the moment, given a number of issues on SIGSEV in the multi-GPU environment:
- https://github.com/pytorch/fairseq/issues/1305
- https://github.com/pytorch/fairseq/issues/1308
- https://github.com/pytorch/fairseq/issues/1346
Are there any precautions on which commit of the repo to use in order to avoid these issues? Is the fix/major update coming soon?
Issue Analytics
- State:
- Created 4 years ago
- Reactions:4
- Comments:5 (2 by maintainers)
Top GitHub Comments
The v2 of the paper is out. The base transformer performs better than expected, but Levenshtein still beats the base in speed, and provides comparable results for summarization:
No, sorry, I do not have the bandwidth now to brush up our results, but you can take a look here for the training scripts and here for fairseq with the comet.ml support.