Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

If I only have one GPU, how to set the config.py?

See original GitHub issue

Thanks for your great project, it’s really helpful.

I only have one GPU, when I try to run the train.py of the cityscapes.bisenet.R18, the following error occured:

Traceback (most recent call last): File “/home/oliver/PycharmProjects/TorchSeg/model/bisenet/cityscapes.bisenet.R18/train.py”, line 131, in <module> loss = model(imgs, gts) File “/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call result = self.forward(*input, **kwargs) File “/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py”, line 141, in forward return self.module(*inputs[0], **kwargs[0]) File “/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call result = self.forward(*input, **kwargs) File “/home/oliver/PycharmProjects/TorchSeg/model/bisenet/cityscapes.bisenet.R18/network.py”, line 77, in forward spatial_out = self.spatial_path(data) File “/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call result = self.forward(*input, **kwargs) File “/home/oliver/PycharmProjects/TorchSeg/model/bisenet/cityscapes.bisenet.R18/network.py”, line 133, in forward x = self.conv_7x7(x) File “/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call result = self.forward(*input, **kwargs) File “/home/oliver/PycharmProjects/TorchSeg/furnace/seg_opr/seg_oprs.py”, line 32, in forward x = self.bn(x) File “/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call result = self.forward(*input, **kwargs) File “/home/oliver/PycharmProjects/TorchSeg/furnace/seg_opr/sync_bn/syncbn.py”, line 50, in forward mean, inv_std = self._slave_pipe.run_slave(_ChildMessage(xsum, xsqsum, N)) AttributeError: ‘NoneType’ object has no attribute ‘run_slave’

After reading this page, I’m wondering how to run this code on the single GPU.

Can you give me some tips? Thank you so much!!

Issue Analytics

State:
Created 5 years ago
Comments:11 (2 by maintainers)

Top GitHub Comments

2reactions

ycszencommented, Feb 20, 2019

@XdpAreKid @zhixuanli I will change the source code to support the single-gpu version recently.

0reactions

yaodongggggggcommented, Apr 29, 2019

@liuxy416

from config import config
from dataloader import get_train_loader
from network import BiSeNet
from datasets import Cityscapes
from utils.init_func import init_weight, group_weight
from engine.lr_policy import PolyLR
from engine.engine import Engine
from seg_opr.loss_opr import SigmoidFocalLoss, ProbOhemCrossEntropy2d
from seg_opr.sync_bn import DataParallelModel, Reduce, BatchNorm2d

BatchNorm2d = torch.nn.BatchNorm2d
engine.distributed = False

Just add the last two lines:

BatchNorm2d = torch.nn.BatchNorm2d
engine.distributed = False

If not ok, delete this line (maybe line 107):

        model = DataParallelModel(model, device_ids=engine.devices)

I just have one gpu too,I change the code like you said,but I have other errors

29 20:50:05 PyTorch Version 1.0.1.post2, Furnace Version 0.1.1 [00:00<?,?it/s]Traceback (most recent call last): File “/home/faustino/python_workshop/TorchSeg-master/model/dfn/voc.dfn.R101_v1c/train.py”, line 120, in <module> loss = model(imgs, gts, cgts) File “/home/faustino/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 489, in call result = self.forward(*input, **kwargs) File “/home/faustino/python_workshop/TorchSeg-master/model/dfn/voc.dfn.R101_v1c/network.py”, line 145, in forward aux_loss0 = self.aux_criterion(boder_out[0], aux_label) File “/home/faustino/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 489, in call result = self.forward(*input, **kwargs) File “/home/faustino/python_workshop/TorchSeg-master/furnace/seg_opr/loss_opr.py”, line 33, in forward pos_part = (1 - pred_sigmoid) ** self.gamma * ( File “/home/faustino/anaconda3/lib/python3.6/site-packages/torch/tensor.py”, line 363, in rsub return _C._VariableFunctions.rsub(self, other) RuntimeError: CUDA error: device-side assert triggered

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “/home/faustino/backup/pycharm-community-2018.3.1/helpers/pydev/pydevd.py”, line 1741, in <module> main() File “/home/faustino/backup/pycharm-community-2018.3.1/helpers/pydev/pydevd.py”, line 1735, in main globals = debugger.run(setup[‘file’], None, None, is_module) File “/home/faustino/backup/pycharm-community-2018.3.1/helpers/pydev/pydevd.py”, line 1135, in run pydev_imports.execfile(file, globals, locals) # execute the script File “/home/faustino/backup/pycharm-community-2018.3.1/helpers/pydev/_pydev_imps/_pydev_execfile.py”, line 18, in execfile exec(compile(contents+“\n”, file, ‘exec’), glob, loc) File “/home/faustino/python_workshop/TorchSeg-master/model/dfn/voc.dfn.R101_v1c/train.py”, line 154, in <module> config.log_dir_link) File “/home/faustino/python_workshop/TorchSeg-master/furnace/engine/engine.py”, line 154, in exit torch.cuda.empty_cache() File “/home/faustino/anaconda3/lib/python3.6/site-packages/torch/cuda/init.py”, line 374, in empty_cache torch._C._cuda_emptyCache() RuntimeError: CUDA error: device-side assert triggered /opt/conda/conda-bld/pytorch_1549630534704/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [2,0,0], thread: [480,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1549630534704/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [2,0,0], thread: [481,0,0] Assertion t >= 0 && t < n_classes failed. We’ve got an error while stopping in post-mortem: <class ‘KeyboardInterrupt’>

Please give me a hand.

Top Results From Across the Web

How to use only one GPU for tensorflow session?

I have two GPUs. My program uses TensorRT and Tensorflow. When I run only TensorRT part, it is fine. When I run together...

only one gpu used when run python script.py #197 - GitHub

Hi, I did not launch the script, as this is not a must. But the python script only use one GPU. In [6]:...

Efficient Training on a Single GPU - Hugging Face

We want to print some summary statistics for the GPU utilization and the training run with the Trainer. We setup a two helper...

Running Python script on GPU. - GeeksforGeeks

First, make sure that Nvidia drivers are upto date also you can install cudatoolkit explicitly from here. then install Anaconda add anaconda to ......

Use a GPU | TensorFlow Core

The second method is to configure a virtual GPU device with ... If you have more than one GPU in your system, the...