Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pre-Training Model and Clarification on QA Dataset

See original GitHub issue

Hi - I’m pretty excited about Longformer and the implications it has for long form NLP!

In the paper, it’s outlined that the Pre-Training was conducted in 5 total phases with starting sequence length of 2,048 and ending sequence length of 23,040. For additional LM Pre-Training (English), what would be the best way to continue Pre-Training with additional datasets like C5?

Does the Pre-Training method care about whiteline block text (such as the Shakespeare txt) vs one complete document per line?

In the case of Multi-Lingual and Translation tasks, would it be similar to T5, where you would be able to Translation tasks by fine-tuning, or would it be more effective to have the languages visible during the Pre-Training process for better predictions downstream? (If so, would that essentially require retraining from Phase 1?)

In the cheatsheet for Finetuning QA, there’s two additional parameters which are:

--wikipedia_dir path/to/evidence/wikipedia/
--web_dir path/to/evidence/web/

Would wikipedia_dir be enwiki8 and web_dir be text8?

Last Question - since Longformer uses a custom CUDA implementation during runtime for compiling the functions, would that mean that TPU accelerators would not be able to be used in this implementation?

Thanks!

Issue Analytics

State:
Created 3 years ago
Comments:6

Top GitHub Comments

1reaction

ibeltagycommented, May 28, 2020

@safooray, just added a notebook that replicates our procedure for pretrianing Longformer https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb. It can be applied to other pretrained models to convert them into Long.

1reaction

ibeltagycommented, May 19, 2020

@trisongz, now that we have code that doesn’t need the custom CUDA kernel, you can try to run it on TPUs.

Top Results From Across the Web

A New Web-Scale Question Answering Dataset for Model Pre ...

In our experiments, we find that pre-training question-answering models on our Common Crawl Question Answering dataset (CCQA) achieves ...

Question answering - Hugging Face Course

Time to look at question answering! This task comes in many flavors, but the one we'll focus on in this section is called...

How to Train A Question-Answering Machine Learning Model

In this article, I will give a brief overview of BERT based QA models and show you how to train Bio-BERT to answer...

Can Generative Pre-trained Language Models Serve As ...

QA finetuning in the same way as LM pre-training ... Models \ Dataset SQuAD WB ... 2018) is a wildly-adopted QA dataset typically...

A New Web-Scale Question Answering Dataset for Model Pre ...

With this previously unseen number of natural QA pairs, we pre-train popular language models to show the potential of large-scale in-domain pre-training for...