Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bert Pretrain Nvidia Data

See original GitHub issue

Hi, Thanks for your great work. I am trying to pretrain model from scratch using DeepSpeed and data from Nvidia link.

And my question is how can I get the data/128 and data/512 dataset as mentioned in bert_large_lamb_nvidia_data.json after running download_wikipedia from Nvidia-lddl.

Should I use preprocess_bert_pretrain from Nvidia-lddl ? If so, what is the exact parameters to get valid data/512 and data/128 dataset for ds_train_bert_nvidia_data_bsz64k_seq128.sh ?

Issue Analytics

State:
Created 2 years ago
Comments:9 (2 by maintainers)

Top GitHub Comments

2reactions

haolin-njucommented, Apr 21, 2022

I think https://github.com/NVIDIA/DeepLearningExamples/blob/04988752a879de969581160bbd208812faba47b6/PyTorch/LanguageModeling/BERT/data/create_datasets_from_start.sh will do.

Hi @Hannibal046 , I successfully ran DeepSpeed on this commit. Thanks a lot, and hope my experience can help you!

0reactions

tjruwasecommented, Apr 21, 2022

@haolin-nju, thanks so much for solving this issue.

Top Results From Across the Web

BERT for PyTorch

BERT is a method of pre-training language representations which obtains state-of-the-art results on a wide array of NLP tasks.

BERT For PyTorch - NVIDIA/DeepLearningExamples

This repository contains scripts to interactively launch data download, training, benchmarking, and inference routines in a Docker container for both pre- ...

BERT Pre-training

We complete BERT pre-training in 44 minutes using 1024 V100 GPUs (64 NVIDIA DGX-2 nodes). In comparison, the previous SOTA from NVIDIA takes...

How to scale the BERT Training with Nvidia GPUs?

In data parallelism, each GPU computes the gradient loss for different ... and PMC full-text articles to further pre-train the BERT model.

Multi-node Bert-pretraining: Cost-efficient Approach

As a result, to train these models within a reasonable time, machine learning (ML) programmers often require advanced hardware setups such as the...