Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training a BERT model on Colab using GPU

See original GitHub issue

Describe the bug I’m trying to use the pre-trained BERT model for text classification as mentioned here https://uber.github.io/ludwig/user_guide/#bert-encoder . I wanted to check if I’m doing something wrong. Thanks in advance for your time!

To Reproduce Steps to reproduce the behavior:

I use the following command to run an experiment ludwig experiment --experiment_name bert-uncaselarge --data_csv /path/to/mydataset.csv --model_definition_file /my/model_definition_bert.yaml
Here is the YAM file ‘http://linkedvocabs.org/dataset/model_definition_bert.yaml’
You can find the log after running the command here 'http://linkedvocabs.org/dataset/log-runningBert.txt
See the output error `2020-02-28 14:57:28.712274: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-02-28 14:57:28.756657: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz 2020-02-28 14:57:28.756922: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2ad4840 executing computations on platform Host. Devices: 2020-02-28 14:57:28.756960: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined> 2020-02-28 14:57:32.899945: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=–tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=–xla_hlo_profile.

Epoch 1 Training: 0% 0/40 [00:00<?, ?it/s]^C`

Expected behavior I expect to have the experiment finishing with the results like with the other encoder I’ve tested so far.

Environment (please complete the following information):

OS: Google Colab
Python version: v3.6
Ludwig version: v0.2.1

Issue Analytics

State:
Created 4 years ago
Comments:17 (2 by maintainers)

Top GitHub Comments

2reactions

kanishk16commented, Jul 1, 2020

After switching to GPU in colab (in the menu Runtime > Change runtime type > None to GPU) and installing Ludwig, simply follow as suggested previously also:

!pip uninstall -y tensorflow
!pip install tensorflow-gpu==1.15.3

import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

There is no need of using %tensorflow_version 1.x in the code cell while importing tf as installing Ludwig simply uninstalls the pre-installed tf v2.2.x and installs the v1.15.3 (cpu), making the latter default tf version to be used which further changes to be tf v1.15.3 (gpu) on manual installation 😃

1reaction

w4nderlustcommented, Jun 25, 2020

Looked into it and it looks like in Colab they removed the GPU option for TesnorFlow 1:

%tensorflow_version 1.x
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Even if you try to uninstall and reinstall manually, it still doesn’t work:

! pip uninstall -y tensorflow
! pip install -y tensorflow-gpu==1.15.2
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

The problem is with Colab, not with Ludwig.

Anyway, the next version of Ludwig will work with TF2 so there will not be any problem in this regard anymore.

Top Results From Across the Web

Training a BERT model on Colab using GPU #645 - GitHub

Describe the bug I'm trying to use the pre-trained BERT model for text classification as ... Training a BERT model on Colab using...

How To Train Your BERT Model 5X Faster Than In Colab

We will build a sentiment classifier with a pre-trained BERT ... Colab offers free GPU access and is a common workspace for deep...

How to Colab with TPU - Towards Data Science

In this article, we'll be discussing how to train a model using TPU on Colab. Specifically, we'll be training BERT for text classification ......

CT-BERT - Huggingface (GPU training) - Google Colab

Finetuning COVID-Twitter-BERT using Huggingface. In this notebook we will finetune ... Compile the Model, Train it on the SST-2 Task and Save the...

Colab GPU Benchmarks for Fine-Tuning BERT - YouTube

Overview and comparison of the Tesla GPUs available in Google Colab. We'll look at their memory capacity and compare their training speeds.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Training a BERT model on Colab using GPU

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[Feature Request] Allow using custom languages/models for spaCy NLP

Running the One Shot Siamese Network produces ValueError: Trying to share variable image_path_1/fc_0/weights, but specified shape (46656, 128) and found shape (3136, 128)