question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pre-Training Model and Clarification on QA Dataset

See original GitHub issue

Hi - I’m pretty excited about Longformer and the implications it has for long form NLP!

In the paper, it’s outlined that the Pre-Training was conducted in 5 total phases with starting sequence length of 2,048 and ending sequence length of 23,040. For additional LM Pre-Training (English), what would be the best way to continue Pre-Training with additional datasets like C5?

Does the Pre-Training method care about whiteline block text (such as the Shakespeare txt) vs one complete document per line?

In the case of Multi-Lingual and Translation tasks, would it be similar to T5, where you would be able to Translation tasks by fine-tuning, or would it be more effective to have the languages visible during the Pre-Training process for better predictions downstream? (If so, would that essentially require retraining from Phase 1?)

In the cheatsheet for Finetuning QA, there’s two additional parameters which are:

--wikipedia_dir path/to/evidence/wikipedia/
--web_dir path/to/evidence/web/

Would wikipedia_dir be enwiki8 and web_dir be text8?

Last Question - since Longformer uses a custom CUDA implementation during runtime for compiling the functions, would that mean that TPU accelerators would not be able to be used in this implementation?

Thanks!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6

github_iconTop GitHub Comments

1reaction
ibeltagycommented, May 28, 2020

@safooray, just added a notebook that replicates our procedure for pretrianing Longformer https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb. It can be applied to other pretrained models to convert them into Long.

1reaction
ibeltagycommented, May 19, 2020

@trisongz, now that we have code that doesn’t need the custom CUDA kernel, you can try to run it on TPUs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

A New Web-Scale Question Answering Dataset for Model Pre ...
In our experiments, we find that pre-training question-answering models on our Common Crawl Question Answering dataset (CCQA) achieves ...
Read more >
Question answering - Hugging Face Course
Time to look at question answering! This task comes in many flavors, but the one we'll focus on in this section is called...
Read more >
How to Train A Question-Answering Machine Learning Model
In this article, I will give a brief overview of BERT based QA models and show you how to train Bio-BERT to answer...
Read more >
Can Generative Pre-trained Language Models Serve As ...
QA finetuning in the same way as LM pre-training ... Models \ Dataset SQuAD WB ... 2018) is a wildly-adopted QA dataset typically...
Read more >
A New Web-Scale Question Answering Dataset for Model Pre ...
With this previously unseen number of natural QA pairs, we pre-train popular language models to show the potential of large-scale in-domain pre-training for...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found