Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Better documentation/tutorial for setting device (and run_opts in general)

See original GitHub issue

It took me a while to get the “pretraining.ipynb”/fine-tune tutorial working on a colab GPU. It would be great if the docs and tutorial would be updated to show how to set the device for single GPU. I am happy to update and PR if desired, please let me know.

Here’s what worked for me

brain = EncDecFineTune(modules, hparams=hparams, opt_class=lambda x: torch.optim.SGD(x, 1e-5), run_opts={'device':"cuda:0"})

I just added run_opts={'device':"cuda:0"}.

Note that you must include the device number, setting device=“cuda” as suggested in #574 causes the following error:

--> 473             torch.cuda.set_device(int(self.device[-1]))
    474 
    475         # Put modules on the right device, accessible with dot notation

ValueError: invalid literal for int() with base 10: 'a'

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:7

Top GitHub Comments

1reaction

TParcolletcommented, Apr 20, 2021

Hey, doing transfer learning with SpeechBrain won’t be hard at all (conceptually speaking), however, you are right when saying that we don’t have “low-resources” models. I guess you could develop some quite nicely. For instance, you could remove the CNN front-end before RNN models. You could also remove the CTC+Att decoding and start with CTC only (making both the VRAM and decoding time lower, this will be at the cost of performance). I would love to see one of the NeMo model implemented in SpeechBrain also (Quartznet or jasper).

0reactions

rbraccocommented, Apr 20, 2021

Thank you @TParcollet, I will see if I am able to implement some transfer learning in speechbrain and report back. I would like to use a much simpler model than the current default CRDNN + RNNLM + BPE, both so that it requires less compute, and so that it is easier to alter for less experienced ASR practitioners (like myself).

I think among the current libraries (SpeechBrain, NeMo, Kaldi, ESPNet) there is still a missing slice for people who want to make models that sacrifice some accuracy to achieve reduced complexity, ease of training, and faster inference. NeMo fits that quite well and makes transfer learning very easy, but it is a bit of a headache to deviate from their standard recipe and alter models. My experience with SpeechBrain has been that the low/mid level API is great and it is easier to swap pieces in and out, but it’s harder to do basic ASR transfer learning (take a simple pretrained encoder/decoder model that is part of the library, and replace the decoder with your new vocab, freeze the stem/body, fit, unfreeze and fit more). Please let me know if I’ve overlooked any models that are simpler but I’ve only seen really large models like the CRDNN and Wav2vec, but I haven’t spent a ton of time in the library yet.

Thanks again, closing the issue for now but I’ll open a new tutorial issue/PR if I make progress or get stuck. Cheers.

Top Results From Across the Web

IMS V13 - System definition - RUNOPTS statement - IBM

The RUNOPTS statement defines the IBM Language Environment for z/OS runtime options to be used to override the IMS Connect default runtime options...

CA OPS/MVS® Event Management and Automation

This Documentation, which includes embedded help systems and electronically distributed materials, (hereinafter referred to as the “Documentation”) is for ...

Quick Tutorial — PyXspec 2.1.1 documentation - HEASARC

This assumes the user already has a basic familiarity with both XSPEC and Python. Everything in PyXspec is accessible by importing the package...

Jetty : The Definitive Reference - Eclipse

This is a fully-functioning Jetty Base (more on that later) complete ... JVM arguments or System Properties set, general server properties, ...

speechbrain.core module - Read the Docs

run_opts (dict) – Run options, such as distributed, device, etc. ... For more complicated use cases, such as multiple modules that need to...