Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Detection of device synchronization

See original GitHub issue

Device synchronization is a source of performance degradation, but currently even some of CuPy functions implicitly (whether intentionally or not) do that (#2797).

It would be nice to have a mechanism to detect device synchronization.

Proposal

Add a context manager within which any operation that synchronizes the device causes an error. The context manager sets a thread-local configuration variable.

with cupyx.allow_synchronize(False):
    ... # operations that are not supposed to synchronize

In low-level CuPy code that cause synchronization like ndarray.get, insert _declare_synchronize() call as follows:

cpdef _declare_synchronize():
    ... # Raise an error if within allow_synchronize(False).

cdef class ndarray:
    cpdef get(self, ...):
        _declare_synchronize()
       ...

Application in unit tests

Given the above implemented, we could apply it in testing.numpy_cupy_equal for example. If the test code has expected device synchronization, the test implementation can enclose the code with cupyx.allow_synchronize(True) explicitly.

Notes

ndarray.get may copy asynchronously if certain conditions are met (if using a non-default stream and given an explicit out with pinned memory). If’s difficult for the code to predict whether it’s actually synchronous or not. Maybe it’s safe to separate asynchronous behavior as another method like ndarray.get_async.

Issue Analytics

State:
Created 4 years ago
Reactions:2
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

emcastillocommented, Dec 14, 2019

I think this is great and could be pretty useful to improve performance of user codes.

One minor detail that just comes to my mind is that instead of explicitly declare every method that synchronizes, could we make the cuda runtime call used to synchronize devices to check the flag set by the context manager?

This will solve the second issue, but it will not detect calls that “could” synchronize but they are not doing it due to certain runtime conditions. (I don’t know if your proposal is just to error on anything that may synchronize which is also legit).

0reactions

niboshicommented, Dec 17, 2019

Thank you for the measurement, but 6us looks quite expensive. In my environment it’s about 1us, but I’m still not confident that it’s cheap enough for an overhead of the lowest-level API call.

Top Results From Across the Web

Synchronization Explained - NI - National Instruments

However, if the same asynchronous input is fed to multiple devices, then they may detect the input change on different clock edges.

and Vehicle to Vehicle-Enabled” Cellular Networks: A survey

Physical Device-To-Device Synchronization Channel (PD2DSCH): The PD2DSCH is used to exchange some information associated with the synchronization or resource ...

View of Device Synchronization Using A Computerize Face ...

Presentation Mode Open Print Download Current View. Go to First Page Go to Last Page. Rotate Clockwise Rotate Counterclockwise. Text Selection Tool

Device Synchronization Using a Computerize Face Detection ...

Device Synchronization Using a Computerize Face Detection and Recognition System for Cyber security · Figures · Citations (2) · References (16).

Multi-Device Synchronization (MDS) - Zurich Instruments

Every application requiring multi-channel signal generation and/or detection may benefit from MDS, especially when multi-channel signals must be generated and/ ...