question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

m2m: generate OOMs on v100

See original GitHub issue

I ran the downloads and the documented “generate on a v100” command:

fairseq-generate \
    data_bin \
    --batch-size 1 \
    --path 12b_last_checkpoint.pt \
    --fixed-dictionary model_dict.128k.txt \
    -s de -t fr \
    --remove-bpe 'sentencepiece' \
    --beam 5 \
    --task translation_multi_simple_epoch \
    --lang-pairs language_pairs.txt \
    --decoder-langtok --encoder-langtok src \
    --gen-subset test \
    --fp16 \
    --dataset-impl mmap \
    --distributed-world-size 1 --distributed-no-spawn \
    --pipeline-model-parallel \
    --pipeline-chunks 1 \
    --pipeline-encoder-balance '[26]' \
    --pipeline-encoder-devices '[0]' \
    --pipeline-decoder-balance '[1,24,1]' \
    --pipeline-decoder-devices '[0,1,0]' > gen_out

on a V100 w torch 1.5 and I got OOM.

fairscale==0.0.3
fairseq # pip install -e . from source at 9b0611e6
torch==1.5.1+cu101

Questions

  1. Has this command worked for others?
  2. Does anyone have a working generate command that takes advantage of multiple gpus?

cc: @shruti-bh

Thanks in advance!

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:22 (15 by maintainers)

github_iconTop GitHub Comments

6reactions
shruti-bhcommented, Oct 22, 2020

I will try to get these models and commands in by end of this week or early next week!

2reactions
sshleifercommented, Oct 29, 2020

that worked! (on 6debe291) Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

m2m: generate OOMs on v100 · Issue #2772 - GitHub
I ran the downloads and the documented "generate on a v100" command: fairseq-generate \ data_bin ... m2m: generate OOMs on v100 #2772.
Read more >
100 Girls Vs 100 Boys For $500,000 - YouTube
Giving away $25k on Current! Sign up and use my code “BEAST250” for a chance to win*: https://www.current.com/beast250SUBSCRIBE OR I TAKE ...
Read more >
TR 118 501 - V1.0.0 - oneM2M Use Case collection - ETSI
4) The oneM2M system generates a program based upon the M2M application service ... find out about the available rooms, heaters, temperature sensors,....
Read more >
Release Notes for Cisco UCS Manager, Release 4.1
Virtual RAID on CPU (VRoC) enables creating and managing RAID volumes within the BIOS of VMD-enabled NVMe SSD drives by using hardware logic ......
Read more >
History BA (Hons) V100 - Undergraduate Degree
Course overview. This three-year History BA Honours degree focuses on developing your skills and abilities to critically analyse historical topics.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found