Pre-Training Model and Clarification on QA Dataset
See original GitHub issueHi - I’m pretty excited about Longformer and the implications it has for long form NLP!
In the paper, it’s outlined that the Pre-Training was conducted in 5 total phases with starting sequence length of 2,048 and ending sequence length of 23,040. For additional LM Pre-Training (English), what would be the best way to continue Pre-Training with additional datasets like C5?
Does the Pre-Training method care about whiteline block text (such as the Shakespeare txt) vs one complete document per line?
In the case of Multi-Lingual and Translation tasks, would it be similar to T5, where you would be able to Translation tasks by fine-tuning, or would it be more effective to have the languages visible during the Pre-Training process for better predictions downstream? (If so, would that essentially require retraining from Phase 1?)
In the cheatsheet for Finetuning QA, there’s two additional parameters which are:
--wikipedia_dir path/to/evidence/wikipedia/
--web_dir path/to/evidence/web/
Would wikipedia_dir be enwiki8 and web_dir be text8?
Last Question - since Longformer uses a custom CUDA implementation during runtime for compiling the functions, would that mean that TPU accelerators would not be able to be used in this implementation?
Thanks!
Issue Analytics
- State:
- Created 3 years ago
- Comments:6
Top GitHub Comments
@safooray, just added a notebook that replicates our procedure for pretrianing Longformer https://github.com/allenai/longformer/blob/master/scripts/convert_model_to_long.ipynb. It can be applied to other pretrained models to convert them into Long.
@trisongz, now that we have code that doesn’t need the custom CUDA kernel, you can try to run it on TPUs.