question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training a BERT model on Colab using GPU

See original GitHub issue

Describe the bug I’m trying to use the pre-trained BERT model for text classification as mentioned here https://uber.github.io/ludwig/user_guide/#bert-encoder . I wanted to check if I’m doing something wrong. Thanks in advance for your time!

To Reproduce Steps to reproduce the behavior:

  1. I use the following command to run an experiment ludwig experiment --experiment_name bert-uncaselarge --data_csv /path/to/mydataset.csv --model_definition_file /my/model_definition_bert.yaml
  2. Here is the YAM file ‘http://linkedvocabs.org/dataset/model_definition_bert.yaml
  3. You can find the log after running the command here 'http://linkedvocabs.org/dataset/log-runningBert.txt
  4. See the output error `2020-02-28 14:57:28.712274: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-02-28 14:57:28.756657: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz 2020-02-28 14:57:28.756922: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2ad4840 executing computations on platform Host. Devices: 2020-02-28 14:57:28.756960: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> 2020-02-28 14:57:32.899945: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=–tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=–xla_hlo_profile.

Epoch 1 Training: 0% 0/40 [00:00<?, ?it/s]^C`

Expected behavior I expect to have the experiment finishing with the results like with the other encoder I’ve tested so far.

Environment (please complete the following information):

  • OS: Google Colab
  • Python version: v3.6
  • Ludwig version: v0.2.1

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:17 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
kanishk16commented, Jul 1, 2020

After switching to GPU in colab (in the menu Runtime > Change runtime type > None to GPU) and installing Ludwig, simply follow as suggested previously also:

!pip uninstall -y tensorflow
!pip install tensorflow-gpu==1.15.3

import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))
  • There is no need of using %tensorflow_version 1.x in the code cell while importing tf as installing Ludwig simply uninstalls the pre-installed tf v2.2.x and installs the v1.15.3 (cpu), making the latter default tf version to be used which further changes to be tf v1.15.3 (gpu) on manual installation 😃
1reaction
w4nderlustcommented, Jun 25, 2020

Looked into it and it looks like in Colab they removed the GPU option for TesnorFlow 1:

%tensorflow_version 1.x
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Even if you try to uninstall and reinstall manually, it still doesn’t work:

! pip uninstall -y tensorflow
! pip install -y tensorflow-gpu==1.15.2
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

The problem is with Colab, not with Ludwig.

Anyway, the next version of Ludwig will work with TF2 so there will not be any problem in this regard anymore.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Training a BERT model on Colab using GPU #645 - GitHub
Describe the bug I'm trying to use the pre-trained BERT model for text classification as ... Training a BERT model on Colab using...
Read more >
How To Train Your BERT Model 5X Faster Than In Colab
We will build a sentiment classifier with a pre-trained BERT ... Colab offers free GPU access and is a common workspace for deep...
Read more >
How to Colab with TPU - Towards Data Science
In this article, we'll be discussing how to train a model using TPU on Colab. Specifically, we'll be training BERT for text classification ......
Read more >
CT-BERT - Huggingface (GPU training) - Google Colab
Finetuning COVID-Twitter-BERT using Huggingface. In this notebook we will finetune ... Compile the Model, Train it on the SST-2 Task and Save the...
Read more >
Colab GPU Benchmarks for Fine-Tuning BERT - YouTube
Overview and comparison of the Tesla GPUs available in Google Colab. We'll look at their memory capacity and compare their training speeds.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found