How to get available devices and set a specific device in Pytorch-DML?
See original GitHub issueHi, For accessing available devices in Pytorch we’d normally do :
print(f'available devices: {torch.cuda.device_count()}')
print(f'current device: { torch.cuda.current_device()}')
However, I noticed this fails (AssertionError: Torch not compiled with CUDA enabled
).
I thought the transition would be minimal, and stuff like this would work out of the box! especially so, after noting we cant write:
print(f'available devices: {torch.dml.device_count()}')
print(f'current device: { torch.dml.current_device()}')
as it fails with the error :
AttributeError: module 'torch.dml' has no attribute 'device_count'
Apart from this, trying to specify a device using the form “dml:number” fails if number>1! that is this fails for “dml:1”:
import torch
import time
def bench(device ='cpu'):
print(f'running on {device}:')
a = torch.randn(size=(2000,2000)).to(device=device)
b = torch.randn(size=(2000,2000)).to(device=device)
start = time.time()
c = a+b
end = time.time()
# print(f'available devices: {torch.dml.device_count()}')
# print(f'current device: { torch.dml.current_device()}')
print(f'--took {end-start:.2f} seconds')
bench('cpu')
bench('dml')
bench('dml:0')
bench('dml:1')
it outputs :
running on cpu:
--took 0.00 seconds
running on dml:
--took 0.01 seconds
running on dml:0:
--took 0.00 seconds
running on dml:1:
and thats it, it doesnt execute when it comes to “dml:1”.
also trying to do :
import torch
import time
def bench(device ='cpu'):
print(f'running on {device}:')
a = torch.randn(size=(2000,2000)).to(device=device)
b = torch.randn_like(a).to(device=device)
start = time.time()
c = a+b
end = time.time()
# print(f'available devices: {torch.dml.device_count()}')
# print(f'current device: { torch.dml.current_device()}')
print(f'--took {end-start:.2f} seconds')
bench('cpu')
bench('dml')
bench('dml:0')
bench('dml:1')
Fails with the following error :
running on cpu:
--took 0.00 seconds
running on dml:
Traceback (most recent call last):
File "g:\tests.py", line 1246, in <module>
bench('dml')
File "g:\tests.py", line 1235, in bench
b = torch.randn_like(a).to(device=device)
RuntimeError: Could not run 'aten::normal_' with arguments from the 'UNKNOWN_TENSOR_TYPE_ID' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom
build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::normal_' is only available for these backends: [CPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradNestedTensor, UNKNOWN_TENSOR_TYPE_ID, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
CPU: registered at D:\a\_work\1\s\build\aten\src\ATen\RegisterCPU.cpp:5926 [kernel]
BackendSelect: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\core\NamedRegistrations.cpp:11 [kernel]
AutogradOther: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradCPU: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradCUDA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradXLA: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradNestedTensor: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
UNKNOWN_TENSOR_TYPE_ID: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse1: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse2: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
AutogradPrivateUse3: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\VariableType_4.cpp:8893 [autograd kernel]
Tracer: registered at D:\a\_work\1\s\torch\csrc\autograd\generated\TraceType_4.cpp:10612 [kernel]
Autocast: fallthrough registered at D:\a\_work\1\s\aten\src\ATen\autocast_mode.cpp:250 [backend fallback]
Batched: registered at D:\a\_work\1\s\aten\src\ATen\BatchingRegistrations.cpp:1016 [backend fallback]
VmapMode: registered at D:\a\_work\1\s\aten\src\ATen\VmapModeRegistrations.cpp:37 [kernel]
Issue Analytics
- State:
- Created 2 years ago
- Reactions:4
- Comments:11 (3 by maintainers)
Top Results From Across the Web
torch.cuda — PyTorch 1.13 documentation
Returns the currently selected Stream for a given device. ... Returns a list of ByteTensor representing the random number states of all devices....
Read more >Get Started With PyTorch With These 5 Basic Functions.
The torch.device enables you to specify the device type responsible to load a tensor into memory. The function expects a string argument ...
Read more >Introducing PyTorch-DirectML: Train your machine learning ...
In this package, DirectML is integrated with the PyTorch framework by introducing a new device named “DML,” which calls on the DirectML APIs...
Read more >How do I list all currently available GPUs with pytorch?
This is not a correct answer. torch.cuda.device(i) returns a context manager that causes future commands to use that device. Putting them all in ......
Read more >Microsoft's PyTorch-DirectML Release-2 Now Works with ...
After that, install a set of needed libraries before installing the PyTorch-directML package with pip. This package includes the DML virtual ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I got the same issue here. Yet their examples work. It looks like you need to import more things from torch first before .to(“dml”) works, otherwise it complains about it. You still can’t do some things like create a new tensor with device set to “dml”.
Once I import the same things as the examples, I can use DML but none of my models appear to be supported. I usually have to freeze the model first so it can run it but I still get: RuntimeError: tensor.is_dml() INTERNAL ASSERT FAILED at “D:\a\_work\1\s\aten\src\ATen\native\dml\DMLTensor.cpp”:422, please report a bug to PyTorch. unbox expects Dml tensor as inputs
I decided to dive in to their headers to figure out more since they have exposed almost nothing to Python.
The current version of pytorch-directml is snapped to PyTorch 1.8, but we understand the pain here given the drift caused by rapid progress and updates made to Torch.
We are working on a solution to address this problem.