question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Enable CUDA support for PyTorch backend

See original GitHub issue

Description

While TensorFlow and JAX will detect out of the box if there is an available GPU for them to run on and utilize it (if the system environment is properly configured for CUDA and cuDNN) PyTorch sets the default device to the CPU and will need to be told explicitly to use devices with CUDA operations enabled (GPUs). This is currently (as of v0.5.4) not supported in pyhf at all.

While this has to be done explicitly, this can at least be done easily with the device keyword in torch.as_tensor

device (torch.device, optional) – the desired device of returned tensor. Default: if None, uses the current device for the default tensor type (see torch.set_default_tensor_type()). device will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

so hopefully will only need modification to the PyTorch backend astensor method

https://github.com/scikit-hep/pyhf/blob/95ad1516af0c22d851557cfff76628a87829b22a/src/pyhf/tensor/pytorch_backend.py#L180

in addition to setting some new keyword arguments in the backend __init__ methods and pyhf.set_backend. So maybe something like

class pytorch_backend:
    """PyTorch backend for pyhf"""

    __slots__ = [
        "name",
        "precision",
        "dtypemap",
        "default_do_grad",
        "use_cuda",
        "device",
    ]

    def __init__(self, **kwargs):
        self.name = 'pytorch'
        self.precision = kwargs.get('precision', '32b')
        self.dtypemap = {
            'float': torch.float64 if self.precision == '64b' else torch.float32,
            'int': torch.int64 if self.precision == '64b' else torch.int32,
            'bool': torch.bool,
        }
        self.default_do_grad = True
        self.use_cuda = kwargs.get("use_gpu", torch.cuda.is_available())
        self.device = torch.device("cuda" if self.use_cuda else "cpu")

# ...

    def astensor(self, tensor_in, dtype='float'):
# ...
        return torch.as_tensor(tensor_in, dtype=dtype, device=self.device)

However, from some quick tests it looks like a bit more work will be needed.

Example failure of tensors being allocated to both CPU and GPU
$ time pyhf cls --backend pytorch HVTWZ_3500.json
Traceback (most recent call last):
  File "/home/feickert/.pyenv/versions/pyhf-dev/bin/pyhf", line 33, in <module>
    sys.exit(load_entry_point('pyhf', 'console_scripts', 'pyhf')())
  File "/home/feickert/.pyenv/versions/3.8.5/envs/pyhf-dev/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/feickert/.pyenv/versions/3.8.5/envs/pyhf-dev/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/feickert/.pyenv/versions/3.8.5/envs/pyhf-dev/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/feickert/.pyenv/versions/3.8.5/envs/pyhf-dev/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/feickert/.pyenv/versions/3.8.5/envs/pyhf-dev/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/feickert/workarea/pyhf/src/pyhf/cli/infer.py", line 207, in cls
    set_backend("pytorch", precision="64b")
  File "/home/feickert/workarea/pyhf/src/pyhf/events.py", line 78, in register_wrapper
    result = func(*args, **kwargs)
  File "/home/feickert/workarea/pyhf/src/pyhf/__init__.py", line 161, in set_backend
    events.trigger("tensorlib_changed")()
  File "/home/feickert/workarea/pyhf/src/pyhf/events.py", line 21, in __call__
    func()(*args, **kwargs)
  File "/home/feickert/workarea/pyhf/src/pyhf/interpolators/code4.py", line 137, in _precompute
    self.bases_up = tensorlib.einsum(
  File "/home/feickert/workarea/pyhf/src/pyhf/tensor/pytorch_backend.py", line 328, in einsum
    return torch.einsum(subscripts, operands)
  File "/home/feickert/.pyenv/versions/3.8.5/envs/pyhf-dev/lib/python3.8/site-packages/torch/functional.py", line 342, in einsum
    return einsum(equation, *_operands)
  File "/home/feickert/.pyenv/versions/3.8.5/envs/pyhf-dev/lib/python3.8/site-packages/torch/functional.py", line 344, in einsum
    return _VF.einsum(equation, operands)  # type: ignore
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

I’ve started some outline work on branch feat/add-gpu-to-torch, and this work will naturally need to address some parts of Issue #896.

As this will necessarily be an API breaking change this should go into v0.7.0, and might be a good motivation to get it out soon after v0.6.0 is released.

There is some additionally helpful examples in the PyTorch documentation on CUDA semantics that demonstrates the need to specify devices.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
matthewfeickertcommented, Apr 22, 2021

Sounds great, thanks 😃 No, this is not impacting me strongly at all, I just wanted to understand how one backend worked w/ the gpu, I don’t have a strong preference between pytorch and jax.

Cool. Let us know if this changes. Thank you also very much for asking! Questions help us revisit old issues that need action and also help us understand where there’s potential pain points, so we really really appreciate them. 😃

1reaction
nhartman94commented, Apr 22, 2021

Sounds great, thanks 😃 No, this is not impacting me strongly at all, I just wanted to understand how one backend worked w/ the gpu, I don’t have a strong preference between pytorch and jax.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to set up and Run CUDA Operations in Pytorch
First, you should ensure that their GPU is CUDA enabled or not by checking their system's GPU through the official Nvidia CUDA compatibility ......
Read more >
torch.backends — PyTorch 1.13 documentation
torch.backends controls the behavior of various backends that PyTorch supports. These backends include: torch.backends.cuda. torch.backends.cudnn.
Read more >
Installing Pytorch with CUDA support on Windows 10
Configure a Conda environment in Pycharm to enable the use of CUDA · Download NVIDIA CUDA Toolkit · Download and Install cuDNN ·...
Read more >
How to setup PyTorch with CUDA in Windows 11 - Medium
Go to the PyTorch website and select the appropriate option to get the command for installing Pytorch with GPU support. I chose the...
Read more >
How to Install PyTorch with CUDA 10.0 - VarHowto
1. cat /usr/local/cuda/version.txt 2. pip install torch==1.4.0 torchvision==0.5.0 -f https://download.pytorch.org/whl/cu100/torch_stable.html Note: PyTorch only supports CUDA 10.0 up to 1.4.0. (Search torch- in https://download.pytorch.org/whl/cu100/torch_stable.html). 3....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found