Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Sweeps not initializing properly with PyTorch Lightening

See original GitHub issue

wandb --version && python --version && uname wandb, version 0.8.36 Python 3.7.6 Linux

Description

I’m trying to initialize a sweep using the WandB Logger for PyTorch Lightening. I’m following the keras example in ‘Intro to Hyperparameter Sweeps with W&B.ipynb’. I’m running it in jupyter on my own machine.

Basic problem: nothing gets loggeed to wandb when I run the sweep. Notable feature: when I start the sweep it initialized a new hyperparameter config and starts a new run. But then it initializes another run. Nothing gets logged to either of them.

Individual runs are fine.

What I Did

sweep_config = {
    'method': 'random', #grid, random
    'metric': {
      'name': 'val_loss',
      'goal': 'minimize'   
    },
    'parameters': {
        'lr': {
            'min': 1e-4,
            'max': 1e-1
        },
    }
}
sweep_id = wandb.sweep(sweep_config, entity="user", project="project-name")

Then I specify the training function:

wandb_logger = WandbLogger()

def train():
    config_defaults = {
    'epochs': 5,
    'bs': 64,
    'lr': 1e-3,
    'seed': 42
}
    wandb.init(config=config_defaults)
    config = config_defaults
   hparams = Namespace(
        lr =  config['lr'],
        bs = config['bs']
        )
    wandb_logger.log_hyperparams(hparams)
    model = AutoEncoder(hparams)
    trainer = pl.Trainer(
        logger=wandb_logger,
        max_epochs=config['epochs'])
    trainer.fit(model)

I then call the sweep

wandb.agent(sweep_id, train)

and get the following output at the start:

INFO:wandb.wandb_agent:Running runs: []
INFO:wandb.wandb_agent:Agent received command: run
INFO:wandb.wandb_agent:Agent starting run with config:
	lr: 0.020506108917114917

wandb: Agent Starting Run: 59fu3sst with config:
	lr: 0.020506108917114917
wandb: Agent Started Run: 59fu3sst
Logging results to Weights & Biases (Documentation).
Project page: https://app.wandb.ai/user/proj-name
Sweep page: https://app.wandb.ai/user/proj-name/sweeps/20thclh6
Run page: https://app.wandb.ai/user/proj-name/runs/59fu3sst

INFO:wandb.run_manager:system metrics and metadata threads started
INFO:wandb.run_manager:checking resume status, waiting at most 10 seconds
INFO:wandb.run_manager:resuming run from id: UnVuOnYxOjU5ZnUzc3N0OmVmZi1kaW0tcmVkLXByb2plY3Q6bGJyYW5uaWdhbg==
INFO:wandb.run_manager:upserting run before process can begin, waiting at most 10 seconds
INFO:wandb.run_manager:saving pip packages
INFO:wandb.run_manager:initializing streaming files api
INFO:wandb.run_manager:unblocking file change observer, beginning sync with W&B servers

Logging results to Weights & Biases (Documentation).
Project page: https://app.wandb.ai/user/proj-name
Run page: https://app.wandb.ai/user/proj-name/runs/xv9xywx7

So it starts the run and gives the sweeps page, but then seems to initialise a new run. There’s no additional wandb code in the model, it’s a standard PTL set-up.

Any suggestions?

Issue Analytics

State:
Created 3 years ago
Comments:9 (6 by maintainers)

Top GitHub Comments

1reaction

borisdaymacommented, Oct 26, 2020

Hi,

These issues should now be solved.

Here are some examples for running sweeps with pytorch-lightning:

with a script: lightning-kitti
with a colab or jupyter notebook: see at the end of this colab

Let me know if you still run into any issue.

1reaction

borisdaymacommented, Jun 6, 2020

Actually you can log hyper-parameters with this object through run.config.update(dict).

Top Results From Across the Web

Sweeps not initializing properly with PyTorch Lightening #1059

I'm trying to initialize a sweep using the WandB Logger for PyTorch Lightening. I'm following the keras example in 'Intro to Hyperparameter ...

LightningModule - PyTorch Lightning - Read the Docs

A LightningModule organizes your PyTorch code into 6 sections: Computations (init). Train Loop (training_step). Validation Loop (validation_step). Test Loop ( ...

Hyperparameter tuning on numerai data with PyTorch ...

Setting up the data, the model and the optimization process in PyTorch Lightning; Defining the training loop to run for the different sweeps...

Use PyTorch Lightning with Weights & Biases - Wandb

In this article, we explore how to use PyTorch Lightning with Weights & Biases so that it's possible to quickly train and monitor...

Tips and Tricks | Grid AI

If you are using PyTorch Lightning and a job gets interrupted you can load ... for stopping a Run that has been running...