question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Google Colab GPUs and Sweeps

See original GitHub issue

Hi I’m new to Wandb and I was using it to optimize the hyperparameters of my deep learning model (the library is simpletransformers). I’m running the fine-tuning on Google colab.

The following is my code:

import logging
import pandas as pd
import wandb
from simpletransformers.classification import ClassificationArgs, ClassificationModel
from statistics import mean
from sklearn.metrics import accuracy_score

sweep_config = {
    #"namke":"vanilla-sweep-batch-16"
    "method": "bayes",  # grid, random
    "metric": {"name": "accuracy", "goal": "maximize"},
    "parameters": {
        "num_train_epochs": {"min":1,"max":40},
        "learning_rate": {"min": 5e-5, "max": 1e-3},
        #"train_batch_size":{"values":[8,16,32,64,128]}
    },
    #W&B Sweeps can also speed up the hyperparameter optimization by terminating any poorly performing runs
    "early_terminate": {"type": "hyperband", "min_iter": 6,}
}

sweep_id = wandb.sweep(sweep_config, project="Simple Sweep")

logging.basicConfig(level=logging.INFO)
transformers_logger = logging.getLogger("transformers")
transformers_logger.setLevel(logging.WARNING)

# Optional model configuration
model_args = ClassificationArgs(sliding_window = True)
model_args.num_train_epochs = 5
model_args.learning_rate = 1e-4
#model_args.train_batch_size = 64
model_args.overwrite_output_dir = True
model_args.reprocess_input_data = True
model_args.no_save = True
###########################################
model_args.use_early_stopping = False
model_args.early_stopping_delta = 0.01
model_args.early_stopping_metric = "mcc"
model_args.early_stopping_metric_minimize = False
model_args.early_stopping_patience = 5
model_args.evaluate_during_training_steps = 1000
model_args.evaluate_during_training = True
model_args.wandb_project = "Simple Sweep"
###########################################

def train():
    # Initialize a new wandb run
    wandb.init()
    print("HyperParams=>>", wandb.config.epochs)

    # Create a TransformerModel
    model = ClassificationModel(
        "electra",
        "google/electra-base-discriminator",
        use_cuda = True,
        args=model_args,
        sweep_config=wandb.config,
    )

    # Train the model
    model.train_model(train_data, eval_df=eval_data, 
                      accuracy=lambda truth, predictions: accuracy_score(truth, [round(p) for p in predictions]), 
    )

    # Evaluate the model
    model.eval_model(test_data,verbose=True)

    # Sync wandb
    wandb.join()


wandb.agent(sweep_id, train)

I received the following output

Create sweep with ID: qqdm6bfy
Sweep URL: https://app.wandb.ai/galvinator/Simple%20Sweep/sweeps/qqdm6bfy
INFO:wandb.wandb_agent:Running runs: []
INFO:wandb.wandb_agent:Agent received command: run
INFO:wandb.wandb_agent:Agent starting run with config:
	learning_rate: 0.0009937809505724256
	num_train_epochs: 29
wandb: Agent Starting Run: h46vr2qv with config:
	learning_rate: 0.0009937809505724256
	num_train_epochs: 29
INFO:wandb.wandb_agent:Running runs: ['h46vr2qv']

However, it remains as such even after 1 hour… is sweep meant to take this long? Should I wait longer - and how long? I’m expecting to see more permutations of learning_rate and num_train_epochs in the output but please let me know if the fine-tuning is much longer.

Thank you

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
issue-label-bot[bot]commented, Jul 22, 2020

Issue-Label Bot is automatically applying the label question to this issue, with a confidence of 0.99. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

0reactions
vanpeltcommented, Jul 22, 2020

You may have to restart your kernel. If any code tries to connect to the gpu outside of the train function the notebook will hang. The other alternative is to shell out and call !wandb agent and have the code execute a python script instead of a functon. More details here: https://docs.wandb.com/sweeps/python-api#wandb-agent

Read more comments on GitHub >

github_iconTop Results From Across the Web

Google Colab GPUs and Sweeps · Issue #1168 - GitHub
Hi I'm new to Wandb and I was using it to optimize the hyperparameters of my deep learning model (the library is simpletransformers)....
Read more >
Fast.ai Lesson 1 on Google Colab (Free GPU)
Google colab is a google internal research tool for data science for some ... You can use GPU as a backend for free...
Read more >
Google Colab - Using Free GPU - Tutorialspoint
Google provides the use of free GPU for your Colab notebooks. Enabling GPU. To enable GPU in your notebook, select the following menu...
Read more >
How does Google Colab fare when compared to a GPU ...
Maximum performance a laptop with GPU has a maximum threshold, whereas maximum performance of Google colab is thresholded by Google server farm.
Read more >
[D] For those of you who don't own a GPU, how do you run ...
Been using Google colab pro for a while now, seems like a reasonable investment.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found