Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

device support

See original GitHub issue

For array creation functions, device support will be needed, unless we intend to only support operations on the default device. Otherwise what will happen if any function that creates a new array (e.g. create the output array with empty() before filling it with the results of some computation) is that the new array will be on the default device, and an exception will be raised if an input array is on a non-default device.

We discussed this in the Aug 27th call, and the preference was to do something PyTorch-like, perhaps a simplified version to start with (we may not need the context manager part), as the most robust option. Summary of some points that were made:

TensorFlow has an issue where its .shape attribute is also a tensor, and that interacts badly with its context manager approach to specifying devices - because metadata like .shape typically should live on the host, not on an accelerator.
PyTorch uses a mix of a default device, a context manager, and device= keywords
JAX also has a context manager-like approach; it has a global default that can be set, and then pmaps can be decorated to override that. The different with other libraries that use a context is that JAX is fairly (too) liberal about implicit device copies.
It’d be best for operations where data is not all on the same device to raise an exception. Implicit device transfers are making it very hard to get a good performance story.
Propagating device assignments through operations is important.
Control over where operations get executed is important; trying to be fully implicit doesn’t scale to situation with multiple GPUs
It may not make sense to add syntax for device support for libraries that only support a single device (i.e., CPU).

Links to the relevant docs for each library:

Next step should be to write up a proposal for something PyTorch-like.

Issue Analytics

State:
Created 3 years ago
Comments:8 (7 by maintainers)

Top GitHub Comments

1reaction

oleksandr-pavlykcommented, Nov 30, 2020

With SYCL, one writes a kernel once, compile it with a SYCL compiler to an IR, and then you can submit it to different queues targeting different devices (i.e. CPU, GPU, FPGA, etc.).

This example constructs a Python extension, compiled with Intel’s DPCPP compiler, to compute column-wise sums of an array.

Running it on CPU/GPU is a matter of changing a queue to submit the work to:

with dpctl.device_context('opencl:gpu'):
    print("Running on: ", dpctl.get_current_queue().get_sycl_device().get_device_name())
    print(sb.columnwise_total(X))

with dpctl.device_context('opencl:cpu'):
    print("Running on: ", dpctl.get_current_queue().get_sycl_device().get_device_name())
    print(sb.columnwise_total(X))

Array consuming library author need not be aware of this, I thought, just as he/she need not be aware of which array implementation is powering the application.

0reactions

rgommerscommented, Dec 3, 2020

TensorFlow’s ndarray.shape returning an ndarray is a behavior that will be rolled back. Tensor.shape’s behavior is to return a TensorShape object which can represent incomplete shapes as well, and that will carry over to ndarray as well.

That’s good to know. In that case I’ll remove the note on that, no point in mentioning it if it’s being phased out.

It is not clear why device needs to be part of the array creation APIs. Context managers can allow mutating global state representing the current device which can be used in the runtime for these calls.

The “mutating global state” points at the exact problem with context managers. Having global state generally makes it harder to write correct code. For the person writing that code it may be fine to keep that all in their head, but it affects any library call that gets invoked. Which is probably still fine in single-device situations (e.g. switch between CPU and one GPU), but beyond that it gets tricky.

The consensus of our conversation in September was that a context manager isn’t always enough, and that the PyTorch model was more powerful. That still left open whether we should also add a context manager though.

Passing device per call can be an unnecessary cost.

Re cost - do you mean cost in verbosity? Passing through a keyword shouldn’t have significant performance cost.

Also it may force the current device to be known which can be hard for generic library code and may require it to query the existing device from somewhere.

I think the typical pattern would be to either use the default, or obtain it from the local context. E.g.

def somefunc(x):
     ....
     # need some new array
     x2 = xp.linspace(0, 1, ...., device=x.device)

And only in more complex situations would the actual device need to be known explicitly.

Also, I am not sure we should enforce constraints on where inputs and outputs can be placed for an operation. Such constraints can make it harder to write portable library code where you don’t control the inputs and may have to start by copying all inputs to the same device. Tensorflow runtime is allowed to copy inputs to the correct device if needed.

That is a good question, should it be enforced or just recommended? Having device transfers be explicit is usually better (implicit transfers can make for hard to track down performance issues), but perhaps not always.

Also there are policies on hard / soft device placement which allow TensorFlow to override user specified placement in cases where the placement is infeasible or sub-optimal. One can further imagine a dynamic placement scenario in cases of async execution.

Interesting, I’m not familiar with this hard/soft distinction, will look at the TF docs.

In addition, outputs may some times need to reside on different device as compared to inputs. Examples often involve operations involving metadata (shape, size) that typically resides on the host.

That should not be a problem if shape and size aren’t arrays, but either custom objects or tuples/ints?

Device placement is generally a “policy” and I think we should leave it as a framework detail instead of having it in the API specification. I am not opposed to reserving a device property in the ndarray API, but I don’t think we should put constraints on how the device placement should be done.

That may be a good idea. Would be great to discuss in more detail later today.

Top Results From Across the Web

AT&T Device Support – Select a Phone, Tablet, or Device Brand

Get smartphone, tablet & mobile device support from AT&T. Start by selecting the brand of your phone, tablet, or mobile device from this...

Devices | T-Mobile Support

Learn how to troubleshoot problems with your mobile phone, tablet, or Internet device. We'll get you back up and running in no time....

Device support - Verizon Wireless

Device support. Learn about your devices. Learn about your device. Interactive simulators ... Device brands. Apple devices. Apple · Samsung devices. Samsung.

device support - Amazon.com

Amazon Devices customer service. Find help and support for your Amazon Devices. RESULTS. Trusted Cervical Neck Traction Device - One Size Fits All...

Xcode iPhoneOS DeviceSupport files (6.0 - 15.4) - GitHub

Xcode 12 now encrypts the connection between Xcode and paired devices, protecting against an attacker in a privileged network position executing arbitrary code ......