Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unable to access GPU - cloud VMs and other machines

See original GitHub issue

I’m having issue running training/pretraining on Azure VM, I found that there are two devices on the VM when I run

sudo lshw -C video

  *-display
       description: VGA compatible controller
       product: Hyper-V virtual VGA
       vendor: Microsoft Corporation
       physical id: 8
       bus info: pci@0000:00:08.0
       version: 00
       width: 32 bits
       clock: 33MHz
       capabilities: vga_controller bus_master rom
       configuration: driver=hyperv_fb latency=0
       resources: irq:11 memory:f8000000-fbffffff memory:c0000-dffff
  *-display
       description: 3D controller
       product: GK210GL [Tesla K80]
       vendor: NVIDIA Corporation
       physical id: 1
       bus info: pci@6d22:00:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress bus_master cap_list
       configuration: driver=nvidia latency=0
       resources: iomemory:100-ff iomemory:140-13f irq:0 memory:41000000-41ffffff memory:1000000000-13ffffffff memory:1400000000-1401ffffff

But spacy will not detect GPU at all, even though the virtual machine comes preconfigured with everything installed (using Data Science Virtual Machine - Ubuntu 18)

I have a feeling it’s because its defaulting to the device 0.

Is there a way to specify device 1 instead (like we can do in spacy train command? I don’t see any option in the CLI reference for specifying GPU in pretrain.

Issue Analytics

State:
Created 3 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

erotavlascommented, May 8, 2020

No more isssues.

1reaction

erotavlascommented, Apr 30, 2020

Ok so I think there is something else going on because I cannot get access to GPU using spacy at all. I tried multiple VM on Azure (Data Science Windows 2019, Data Science Ubuntu, I even created my own VM from scratch and installed CUDA toolkit myself and still nothing) All tested using NC_6 level machine with Tesla GPU.

On Ubuntu Data Science VM I am able to run pretraining, and spacy.prefer_gpu(0) returns True however when I run it, the wps is lower than a 1060 card which is around 40000 wps - even though the card is a Tesla K80. It seems to be defaulting to the correct card, and I can see activity in the card when I watch the gpu usage but performance is abysmal.

Furthermore pytorch is able to see and access the GPU when I run the following

Python 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import spacy
>>> spacy.prefer_gpu()
False
>>> spacy.require_gpu()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Miniconda\envs\spacy\lib\site-packages\thinc\neural\util.py", line 87, in require_gpu
    raise ValueError("GPU is not accessible. Was the library installed correctly?")
ValueError: GPU is not accessible. Was the library installed correctly?
>>> import torch
>>> torch.cuda.current_device()                                                                                         0
>>> torch.cuda.device_count()
1
>>> torch.cuda.get_device_name(0)
'Tesla K80'
>>> torch.cuda.is_available()
True
>>>

The only place I’ve been successful using GPU is on a physical workstation.

Has anyone else had any luck training on cloud gpu?