question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Sweeps not initializing properly with PyTorch Lightening

See original GitHub issue

wandb --version && python --version && uname wandb, version 0.8.36 Python 3.7.6 Linux

Description

I’m trying to initialize a sweep using the WandB Logger for PyTorch Lightening. I’m following the keras example in ‘Intro to Hyperparameter Sweeps with W&B.ipynb’. I’m running it in jupyter on my own machine.

Basic problem: nothing gets loggeed to wandb when I run the sweep. Notable feature: when I start the sweep it initialized a new hyperparameter config and starts a new run. But then it initializes another run. Nothing gets logged to either of them.

Individual runs are fine.

What I Did

sweep_config = {
    'method': 'random', #grid, random
    'metric': {
      'name': 'val_loss',
      'goal': 'minimize'   
    },
    'parameters': {
        'lr': {
            'min': 1e-4,
            'max': 1e-1
        },
    }
}
sweep_id = wandb.sweep(sweep_config, entity="user", project="project-name")

Then I specify the training function:

wandb_logger = WandbLogger()

def train():
    config_defaults = {
    'epochs': 5,
    'bs': 64,
    'lr': 1e-3,
    'seed': 42
}
    wandb.init(config=config_defaults)
    config = config_defaults
   hparams = Namespace(
        lr =  config['lr'],
        bs = config['bs']
        )
    wandb_logger.log_hyperparams(hparams)
    model = AutoEncoder(hparams)
    trainer = pl.Trainer(
        logger=wandb_logger,
        max_epochs=config['epochs'])
    trainer.fit(model)

I then call the sweep

wandb.agent(sweep_id, train)

and get the following output at the start:

INFO:wandb.wandb_agent:Running runs: []
INFO:wandb.wandb_agent:Agent received command: run
INFO:wandb.wandb_agent:Agent starting run with config:
	lr: 0.020506108917114917

wandb: Agent Starting Run: 59fu3sst with config:
	lr: 0.020506108917114917
wandb: Agent Started Run: 59fu3sst
Logging results to Weights & Biases (Documentation).
Project page: https://app.wandb.ai/user/proj-name
Sweep page: https://app.wandb.ai/user/proj-name/sweeps/20thclh6
Run page: https://app.wandb.ai/user/proj-name/runs/59fu3sst

INFO:wandb.run_manager:system metrics and metadata threads started
INFO:wandb.run_manager:checking resume status, waiting at most 10 seconds
INFO:wandb.run_manager:resuming run from id: UnVuOnYxOjU5ZnUzc3N0OmVmZi1kaW0tcmVkLXByb2plY3Q6bGJyYW5uaWdhbg==
INFO:wandb.run_manager:upserting run before process can begin, waiting at most 10 seconds
INFO:wandb.run_manager:saving pip packages
INFO:wandb.run_manager:initializing streaming files api
INFO:wandb.run_manager:unblocking file change observer, beginning sync with W&B servers

Logging results to Weights & Biases (Documentation).
Project page: https://app.wandb.ai/user/proj-name
Run page: https://app.wandb.ai/user/proj-name/runs/xv9xywx7

So it starts the run and gives the sweeps page, but then seems to initialise a new run. There’s no additional wandb code in the model, it’s a standard PTL set-up.

Any suggestions?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
borisdaymacommented, Oct 26, 2020

Hi,

These issues should now be solved.

Here are some examples for running sweeps with pytorch-lightning:

Let me know if you still run into any issue.

1reaction
borisdaymacommented, Jun 6, 2020

Actually you can log hyper-parameters with this object through run.config.update(dict).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Sweeps not initializing properly with PyTorch Lightening #1059
I'm trying to initialize a sweep using the WandB Logger for PyTorch Lightening. I'm following the keras example in 'Intro to Hyperparameter ...
Read more >
LightningModule - PyTorch Lightning - Read the Docs
A LightningModule organizes your PyTorch code into 6 sections: Computations (init). Train Loop (training_step). Validation Loop (validation_step). Test Loop ( ...
Read more >
Hyperparameter tuning on numerai data with PyTorch ...
Setting up the data, the model and the optimization process in PyTorch Lightning; Defining the training loop to run for the different sweeps...
Read more >
Use PyTorch Lightning with Weights & Biases - Wandb
In this article, we explore how to use PyTorch Lightning with Weights & Biases so that it's possible to quickly train and monitor...
Read more >
Tips and Tricks | Grid AI
If you are using PyTorch Lightning and a job gets interrupted you can load ... for stopping a Run that has been running...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found