question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

If I only have one GPU, how to set the config.py?

See original GitHub issue

Thanks for your great project, it’s really helpful.

I only have one GPU, when I try to run the train.py of the cityscapes.bisenet.R18, the following error occured:

Traceback (most recent call last): File “/home/oliver/PycharmProjects/TorchSeg/model/bisenet/cityscapes.bisenet.R18/train.py”, line 131, in <module> loss = model(imgs, gts) File “/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call result = self.forward(*input, **kwargs) File “/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py”, line 141, in forward return self.module(*inputs[0], **kwargs[0]) File “/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call result = self.forward(*input, **kwargs) File “/home/oliver/PycharmProjects/TorchSeg/model/bisenet/cityscapes.bisenet.R18/network.py”, line 77, in forward spatial_out = self.spatial_path(data) File “/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call result = self.forward(*input, **kwargs) File “/home/oliver/PycharmProjects/TorchSeg/model/bisenet/cityscapes.bisenet.R18/network.py”, line 133, in forward x = self.conv_7x7(x) File “/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call result = self.forward(*input, **kwargs) File “/home/oliver/PycharmProjects/TorchSeg/furnace/seg_opr/seg_oprs.py”, line 32, in forward x = self.bn(x) File “/home/oliver/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call result = self.forward(*input, **kwargs) File “/home/oliver/PycharmProjects/TorchSeg/furnace/seg_opr/sync_bn/syncbn.py”, line 50, in forward mean, inv_std = self._slave_pipe.run_slave(_ChildMessage(xsum, xsqsum, N)) AttributeError: ‘NoneType’ object has no attribute ‘run_slave’

After reading this page, I’m wondering how to run this code on the single GPU.

Can you give me some tips? Thank you so much!!

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:11 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
ycszencommented, Feb 20, 2019

@XdpAreKid @zhixuanli I will change the source code to support the single-gpu version recently.

0reactions
yaodongggggggcommented, Apr 29, 2019

@liuxy416

from config import config
from dataloader import get_train_loader
from network import BiSeNet
from datasets import Cityscapes
from utils.init_func import init_weight, group_weight
from engine.lr_policy import PolyLR
from engine.engine import Engine
from seg_opr.loss_opr import SigmoidFocalLoss, ProbOhemCrossEntropy2d
from seg_opr.sync_bn import DataParallelModel, Reduce, BatchNorm2d

BatchNorm2d = torch.nn.BatchNorm2d
engine.distributed = False

Just add the last two lines:

BatchNorm2d = torch.nn.BatchNorm2d
engine.distributed = False

If not ok, delete this line (maybe line 107):

        model = DataParallelModel(model, device_ids=engine.devices)

I just have one gpu too,I change the code like you said,but I have other errors

29 20:50:05 PyTorch Version 1.0.1.post2, Furnace Version 0.1.1 [00:00<?,?it/s]Traceback (most recent call last): File “/home/faustino/python_workshop/TorchSeg-master/model/dfn/voc.dfn.R101_v1c/train.py”, line 120, in <module> loss = model(imgs, gts, cgts) File “/home/faustino/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 489, in call result = self.forward(*input, **kwargs) File “/home/faustino/python_workshop/TorchSeg-master/model/dfn/voc.dfn.R101_v1c/network.py”, line 145, in forward aux_loss0 = self.aux_criterion(boder_out[0], aux_label) File “/home/faustino/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 489, in call result = self.forward(*input, **kwargs) File “/home/faustino/python_workshop/TorchSeg-master/furnace/seg_opr/loss_opr.py”, line 33, in forward pos_part = (1 - pred_sigmoid) ** self.gamma * ( File “/home/faustino/anaconda3/lib/python3.6/site-packages/torch/tensor.py”, line 363, in rsub return _C._VariableFunctions.rsub(self, other) RuntimeError: CUDA error: device-side assert triggered

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “/home/faustino/backup/pycharm-community-2018.3.1/helpers/pydev/pydevd.py”, line 1741, in <module> main() File “/home/faustino/backup/pycharm-community-2018.3.1/helpers/pydev/pydevd.py”, line 1735, in main globals = debugger.run(setup[‘file’], None, None, is_module) File “/home/faustino/backup/pycharm-community-2018.3.1/helpers/pydev/pydevd.py”, line 1135, in run pydev_imports.execfile(file, globals, locals) # execute the script File “/home/faustino/backup/pycharm-community-2018.3.1/helpers/pydev/_pydev_imps/_pydev_execfile.py”, line 18, in execfile exec(compile(contents+“\n”, file, ‘exec’), glob, loc) File “/home/faustino/python_workshop/TorchSeg-master/model/dfn/voc.dfn.R101_v1c/train.py”, line 154, in <module> config.log_dir_link) File “/home/faustino/python_workshop/TorchSeg-master/furnace/engine/engine.py”, line 154, in exit torch.cuda.empty_cache() File “/home/faustino/anaconda3/lib/python3.6/site-packages/torch/cuda/init.py”, line 374, in empty_cache torch._C._cuda_emptyCache() RuntimeError: CUDA error: device-side assert triggered /opt/conda/conda-bld/pytorch_1549630534704/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [2,0,0], thread: [480,0,0] Assertion t >= 0 && t < n_classes failed. /opt/conda/conda-bld/pytorch_1549630534704/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [2,0,0], thread: [481,0,0] Assertion t >= 0 && t < n_classes failed. We’ve got an error while stopping in post-mortem: <class ‘KeyboardInterrupt’>

Please give me a hand.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to use only one GPU for tensorflow session?
I have two GPUs. My program uses TensorRT and Tensorflow. When I run only TensorRT part, it is fine. When I run together...
Read more >
only one gpu used when run python script.py #197 - GitHub
Hi, I did not launch the script, as this is not a must. But the python script only use one GPU. In [6]:...
Read more >
Efficient Training on a Single GPU - Hugging Face
We want to print some summary statistics for the GPU utilization and the training run with the Trainer. We setup a two helper...
Read more >
Running Python script on GPU. - GeeksforGeeks
First, make sure that Nvidia drivers are upto date also you can install cudatoolkit explicitly from here. then install Anaconda add anaconda to ......
Read more >
Use a GPU | TensorFlow Core
The second method is to configure a virtual GPU device with ... If you have more than one GPU in your system, the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found