Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scalar on GPU or setting config to Torch/GPU fails

See original GitHub issue

A first call to dsharp.config(backend=Backend.Torch, device=Device.GPU) fails with

System.InvalidOperationException: Torch device type CUDA did not initialise on the current machine.
   at TorchSharp.Torch.InitializeDeviceType(DeviceType deviceType)
   at TorchSharp.Tensor.TorchTensor.ToDevice(DeviceType deviceType, Int32 deviceIndex)
   ...

This is because https://github.com/DiffSharp/DiffSharp/blob/12137670a9aa347a0ac3199b45526c4118a821a7/src/DiffSharp.Core/DiffSharp.fs#L1116

Calls Float64Tensor.From(0.)

https://github.com/xamarin/TorchSharp/blob/bffe20f1221d5e0f2985cef573f393e3c7187238/src/TorchSharp/Tensor/TorchTensorTyped.generated.cs#L2190-L2196

Initializing torch to DeviceType.CPU and committing to the “torch_cpu” library.

A workaround is to make sure GPU is the first used device:

dsharp.tensor([1f], Dtype.Float32, Device.GPU, Backend.Torch)
dsharp.config(backend=Backend.Torch, device=Device.GPU)

Works fine because https://github.com/xamarin/TorchSharp/blob/bffe20f1221d5e0f2985cef573f393e3c7187238/src/TorchSharp/Tensor/TorchTensorTyped.tt#L259-L271

Does not first initialize device type.

Overall it looks like this was introduced in https://github.com/xamarin/TorchSharp/commit/7b3f3ca5c6233eb20010f231cab65178103eeae2

Where an initialization mask was swapped out for a single bool.

Issue Analytics

State:
Created 3 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

dsymecommented, Oct 10, 2021

Note that even after #383 the package restore will not yet unify native binaries into a single place that’s fully auto-loadable from F# interactive and notebooks. Hence there is still the workaround code in torchsharp to facilitate the loading. It works but feels fragile and is not under CI automated test, I wouldn’t be surprised if we end up back here.

0reactions

gbaydincommented, Oct 10, 2021

Also the recently merged #383 should significantly improve the reliability of loading native components as TorchSharp and libtorch native libraries are packaged together which helps them to be found together in runtime.