question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bert Pretrain Nvidia Data

See original GitHub issue

Hi, Thanks for your great work. I am trying to pretrain model from scratch using DeepSpeed and data from Nvidia link.

And my question is how can I get the data/128 and data/512 dataset as mentioned in bert_large_lamb_nvidia_data.json after running download_wikipedia from Nvidia-lddl.

Should I use preprocess_bert_pretrain from Nvidia-lddl ? If so, what is the exact parameters to get valid data/512 and data/128 dataset for ds_train_bert_nvidia_data_bsz64k_seq128.sh ?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
haolin-njucommented, Apr 21, 2022

I think https://github.com/NVIDIA/DeepLearningExamples/blob/04988752a879de969581160bbd208812faba47b6/PyTorch/LanguageModeling/BERT/data/create_datasets_from_start.sh will do.

Hi @Hannibal046 , I successfully ran DeepSpeed on this commit. Thanks a lot, and hope my experience can help you!

0reactions
tjruwasecommented, Apr 21, 2022

@haolin-nju, thanks so much for solving this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

BERT for PyTorch
BERT is a method of pre-training language representations which obtains state-of-the-art results on a wide array of NLP tasks.
Read more >
BERT For PyTorch - NVIDIA/DeepLearningExamples
This repository contains scripts to interactively launch data download, training, benchmarking, and inference routines in a Docker container for both pre- ...
Read more >
BERT Pre-training
We complete BERT pre-training in 44 minutes using 1024 V100 GPUs (64 NVIDIA DGX-2 nodes). In comparison, the previous SOTA from NVIDIA takes...
Read more >
How to scale the BERT Training with Nvidia GPUs?
In data parallelism, each GPU computes the gradient loss for different ... and PMC full-text articles to further pre-train the BERT model.
Read more >
Multi-node Bert-pretraining: Cost-efficient Approach
As a result, to train these models within a reasonable time, machine learning (ML) programmers often require advanced hardware setups such as the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found