question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pytorch-lighning: W&B process failed to launch

See original GitHub issue

Description

I am trying to get weights and bias working with PyTorch-lightning. I didn’t know if this is an issue of lightning or Wandb. But I think it’s from wandb. When trying to start the run I get the error message: wandb.run_manager.LaunchError: W&B process failed to launch, see: wandb\debug.log

Stacktrace:

D:\Programme\Anaconda3\envs\pytorch\python.exe C:/Users/schup/PycharmProjects/Workspace/pytorch_test/GAN/CNN_lightning.py
wandb: Tracking run with wandb version 0.8.29
Traceback (most recent call last):
  File "D:\Programme\Anaconda3\envs\pytorch\lib\site-packages\wandb\internal_cli.py", line 106, in <module>
    main()
  File "D:\Programme\Anaconda3\envs\pytorch\lib\site-packages\wandb\internal_cli.py", line 98, in main
    headless(args)
  File "D:\Programme\Anaconda3\envs\pytorch\lib\site-packages\wandb\internal_cli.py", line 54, in headless
    util.sentry_reraise(e)
  File "D:\Programme\Anaconda3\envs\pytorch\lib\site-packages\wandb\util.py", line 94, in sentry_reraise
    six.reraise(type(exc), exc, sys.exc_info()[2])
  File "D:\Programme\Anaconda3\envs\pytorch\lib\site-packages\six.py", line 703, in reraise
    raise value
  File "D:\Programme\Anaconda3\envs\pytorch\lib\site-packages\wandb\internal_cli.py", line 52, in headless
    user_process_pid, stdout_master_fd, stderr_master_fd)
  File "D:\Programme\Anaconda3\envs\pytorch\lib\site-packages\wandb\run_manager.py", line 1137, in wrap_existing_process
    stderr_read_file = os.fdopen(stderr_read_fd, 'rb')
  File "D:\Programme\Anaconda3\envs\pytorch\lib\os.py", line 1027, in fdopen
    return io.open(fd, *args, **kwargs)
OSError: [WinError 6] The handle is invalid
wandb: ERROR W&B process (PID 15916) did not respond
wandb: ERROR W&B process failed to launch, see: wandb\debug.log
Traceback (most recent call last):
  File "C:/Users/schup/PycharmProjects/Workspace/pytorch_test/GAN/CNN_lightning.py", line 132, in <module>
    trainer.fit(net)
  File "D:\Programme\Anaconda3\envs\pytorch\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 630, in fit
    self.run_pretrain_routine(model)
  File "D:\Programme\Anaconda3\envs\pytorch\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 748, in run_pretrain_routine
    self.logger.log_hyperparams(ref_model.hparams)
  File "D:\Programme\Anaconda3\envs\pytorch\lib\site-packages\pytorch_lightning\loggers\base.py", line 18, in wrapped_fn
    fn(self, *args, **kwargs)
  File "D:\Programme\Anaconda3\envs\pytorch\lib\site-packages\pytorch_lightning\loggers\wandb.py", line 96, in log_hyperparams
    self.experiment.config.update(params)
  File "D:\Programme\Anaconda3\envs\pytorch\lib\site-packages\pytorch_lightning\loggers\wandb.py", line 87, in experiment
    id=self._id, resume='allow', tags=self._tags, entity=self._entity)
  File "D:\Programme\Anaconda3\envs\pytorch\lib\site-packages\wandb\__init__.py", line 1088, in init
    _init_headless(run)
  File "D:\Programme\Anaconda3\envs\pytorch\lib\site-packages\wandb\__init__.py", line 304, in _init_headless
    "W&B process failed to launch, see: {}".format(path))
wandb.run_manager.LaunchError: W&B process failed to launch, see: wandb\debug.log

Debug log:

2020-03-07 18:47:34,666 DEBUG   MainThread:15916 [wandb_config.py:_load_defaults():119] no defaults not found in config-defaults.yaml
2020-03-07 18:47:34,723 DEBUG   MainThread:15916 [util.py:is_cygwin_git():314] Failed checking if running in CYGWIN due to: FileNotFoundError(2, 'The system cannot find the file specified', None, 2, None)
2020-03-07 18:47:34,725 DEBUG   MainThread:15916 [git_repo.py:repo():30] git repository is invalid
2020-03-07 18:47:34,742 DEBUG   MainThread:15916 [meta.py:setup():97] code probe starting
2020-03-07 18:47:34,742 DEBUG   MainThread:15916 [meta.py:setup():101] non time limited probe of code
2020-03-07 18:47:34,743 DEBUG   MainThread:15916 [meta.py:_setup_code_program():58] save program starting
2020-03-07 18:47:34,743 DEBUG   MainThread:15916 [meta.py:_setup_code_program():60] save program starting: C:/Users/schup/PycharmProjects/Workspace/pytorch_test/GAN/CNN_lightning.py
2020-03-07 18:47:34,743 DEBUG   MainThread:15916 [meta.py:_setup_code_program():65] save program saved: C:\Users\schup\PycharmProjects\Workspace\pytorch_test\GAN\wandb\run-20200307_174733-wcp2iba7\code\CNN_lightning.py
2020-03-07 18:47:34,744 DEBUG   MainThread:15916 [meta.py:_setup_code_program():67] save program
2020-03-07 18:47:34,744 DEBUG   MainThread:15916 [meta.py:setup():119] code probe done
2020-03-07 18:47:34,753 DEBUG   MainThread:15916 [run_manager.py:__init__():541] Initialized sync for None/wcp2iba7
2020-03-07 18:47:34,758 DEBUG   MainThread:15916 [connectionpool.py:_new_conn():959] Starting new HTTPS connection (1): api.wandb.ai:443
2020-03-07 18:47:34,951 DEBUG   MainThread:15916 [connectionpool.py:_make_request():437] https://api.wandb.ai:443 "POST /graphql HTTP/1.1" 200 None
2020-03-07 18:47:34,956 DEBUG   raven-sentry.BackgroundWorker:15916 [connectionpool.py:_new_conn():959] Starting new HTTPS connection (1): sentry.io:443

What I Did

if __name__ == "__main__":
    net = CNN()

    wd_logger = loggers.WandbLogger(name="test")
    trainer = pl.Trainer(logger=wd_logger) 
    trainer.fit(net)
    trainer.test()

My Enviroment

PyTorch version: 1.4.0 OS: Microsoft Windows 10 Pro Python version: 3.7 Is CUDA available: Yes Wandb 0.8.29

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
ariG23498commented, Dec 3, 2020

Hey @LuposX In the past year we’ve majorly reworked the CLI and UI for Weights & Biases. We’re closing issues older than 6 months. Please comment to reopen.

1reaction
LuposXcommented, Mar 10, 2020

@borisdayma this should be a minimal example of an MNIST classifier:

import os

import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision import transforms, datasets
import torch.nn as nn

import pytorch_lightning as pl
from pytorch_lightning import loggers

from PIL import Image


class CNN(pl.LightningModule):
    def __init__(self):
        super().__init__()

        self.test_correct_counter = 0
        self.test_total_counter = 0

        self.l1 = nn.Linear(28 * 28, 28 * 28 * 5)
        self.l2 = nn.Linear(28 * 28 * 5, 28 * 28)
        self.l3 = nn.Linear(28 * 28, 10)

    def forward(self, x):
        x = torch.flatten(x, start_dim=1, end_dim=-1)
        x = torch.relu(self.l1(x))
        x = torch.relu(self.l2(x))
        x = self.l3(x)

        return x

    def cross_entropy_loss(self, predicted_label, label):
        return F.cross_entropy(predicted_label, label)

    def training_step(self, batch, batch_idx):
        x, y = batch

        predicted_label = self.forward(x)
        loss = self.cross_entropy_loss(predicted_label, y)

        logs = {"train_loss": loss}
        return {"loss": loss, "log": logs}

    def validation_step(self, val_batch, batch_idx):
        x, y = val_batch

        predicted = self.forward(x)
        loss = self.cross_entropy_loss(predicted, y)

        self.test_correct_counter += int((torch.argmax(predicted, 1).flatten() == y).sum())
        self.test_total_counter += y.size(0)

        logs = {"val_loss": loss}
        return {"val_loss": loss, "logs": logs}

    def validation_epoch_end(self, outputs):
        # outputs is an array with what you returned in validation_step for each batch
        # outputs = [{'loss': batch_0_loss}, {'loss': batch_1_loss}, ..., {'loss': batch_n_loss}]

        avg_acc = 100 * self.test_correct_counter / self.test_total_counter

        self.test_correct_counter = 0
        self.test_total_counter = 0

        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        tensorboard_logs = {'avg_acc': avg_acc, 'val_loss': avg_loss}
        return {'avg_acc': avg_acc, 'avg_val_loss': avg_loss, 'log': tensorboard_logs}

    def prepare_data(self):
        compose = transforms.Compose([
            transforms.ToTensor()
        ])

        self.mnist_train = datasets.MNIST(
            root="data",
            train=True,
            download=True,
            transform=compose
        )

        self.mnist_test = datasets.MNIST(
            root="data",
            train=False,
            download=True,
            transform=compose
        )

        self.mnist_train, self.mnist_val = torch.utils.data.random_split(self.mnist_train, [55000, 5000])

    def train_dataloader(self):
        mnist_train_loader = torch.utils.data.DataLoader(self.mnist_train,
                                                         batch_size=128 * 4,
                                                         num_workers=1,
                                                         shuffle=True)

        return mnist_train_loader

    def val_dataloader(self):
        mnist_val_loader = torch.utils.data.DataLoader(self.mnist_val,
                                                         batch_size=128 * 4,
                                                         num_workers=1,
                                                         shuffle=True)

        return mnist_val_loader

    def test_dataloader(self):
        mnist_test_loader = torch.utils.data.DataLoader(self.mnist_test,
                                                       batch_size=128 * 4,
                                                       num_workers=1,
                                                       shuffle=True)

        return mnist_test_loader

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters())


if __name__ == "__main__":
    net = CNN.load_from_checkpoint("tb_logs/NN_08_03_20/version_0/checkpoints/epoch=4.ckpt")

    net = CNN()

    wd_logger = loggers.WandbLogger(name="test")
    trainer = pl.Trainer(logger=wd_logger)
    trainer.fit(net)
    trainer.test()

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pytorch-lighning: W&B process failed to launch #905 - GitHub
Description I am trying to get weights and bias working with PyTorch-lightning. I didn't know if this is an issue of lightning or...
Read more >
Trainer — PyTorch Lightning 1.8.5.post0 documentation
Once you're done training, feel free to run the test set! ... may fail if other process is occupying) trainer = Trainer(accelerator="gpu", devices=2, ......
Read more >
Pytorch-lightning strange error: implemented method that ...
I am trying to use the library Pytorch Lightning, but I am encountering a strange error that I am not able to solve....
Read more >
PyTorch Lightning
Run your code on any hardware; Performance & bottleneck profiler; Model checkpointing; 16-bit precision; Run distributed training.
Read more >
torchrun (Elastic Launch) — PyTorch 1.13 documentation
distributed.launch to torchrun . To take advantage of new features such as elasticity, fault-tolerance, and error reporting of ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found