Training NMT models?
See original GitHub issueHello! Thanks, Tim! I tried bitsandbytes
for language models like BLOOM, and it works well.
I have a question about NMT models like NLLB, M2M, mBART, or OPUS. I tried inference for NLLB, and apparently it is not supported. Are any of these models supported for inference, and especially for fine-tuning?
Many thanks!
Issue Analytics
- State:
- Created 10 months ago
- Comments:5
Top Results From Across the Web
Training efficient neural network models for Firefox Translations
NMT models are trained as language pairs, translating from language A to language B. The training pipeline was designed to train translation ...
Read more >Tutorial: Neural Machine Translation - seq2seq - Google
However, learning a model based on words has a couple of drawbacks. Because NMT models output a probability distribution over words, they can...
Read more >NMT | NMnetwork - Neurosequential Network
The Phase I Training Certification program is organized into 10 modules and will take approximately 12 months to complete. Our program involves active ......
Read more >Training Neural Machine Translation (NMT) Models using ...
Abstract: We implement a Tensor Train layer in the TensorFlow Neural Machine Translation (NMT) model using the t3f library.
Read more >Scaling neural machine translation to bigger data sets with ...
As NMT models become increasingly successful at learning from large-scale monolingual data (data that is available only in a single ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @ymoslem Thanks a lot for your message! Indeed, it is not possible for now to train any
8bit
model usingtransformers
- we are currently seeing if we can apply LoRA (Low Rank Adaptators) on 8-bit models usingtransformers
but it is under discussion. We’ll keep you postedHi @ymoslem Thanks a lot for your message
Yes this is expected, the 8-bit is currently slower than the fp16 model because the 8-bit quantization is done in two stages. You can check out more about that on the 8-bit integration blogpost.
Yes, please use
load_in_8bit_threshold
instead. Could you point me to the place you have read that says to useint8_threshold
? Maybe the documentation has not been updatedCould you share with me how do you measure that? Note that the memory optimization between fp16 and int8 model really depends on the model size, for
nllb-600M
you get a memory footpint saving of 1.18, for3.3B
you get a saving of1.41
, etc and it linearly grows with the size of the model. You can check that with this snippet: