RFC: Support performant `einsum()` and `einsum_path()` using cuQuantum
See original GitHub issueDescription
Related to #6078.
The new cuQuantum SDK provides two libraries, one of which is cuTensorNet that accelerates tensor network contraction (backed by cuTENSOR). Specifically, cuTensorNet currently handles two major challenges from problems that have a huge Einsum expression:
- Finding the optimal contraction path
- Executing the actual pairwise contraction for a given path in an optimal fashion
Those two functionalities map very nicely to NumPy’s einsum_path()
and einsum()
APIs, respectively. As a result, in cuQuantum Python, the Python binding for cuQuantum libraries, we provide pythonic APIs cuquantum.einsum_path()
and cuquantum.einsum()
among other things. We currently provide two conda packages, cuquantum
and cuquantum-python
, and a pip wheel release is in our near-term roadmap.
I would like to propose using cuQuantum Python to back CuPy’s einsum_path()
and einsum()
when available. Performance improvement data can be shared if needed. However, there are a few issues to be addressed:
- Because CuPy is currently a required dependency of cuQuantum Python, we need to break any potential circular import.
- For the initial release,
cuquantum.einsum_path()
andcuquantum.einsum()
are not yet a 100% drop-in replacement of their NumPy counterparts (we mark most optional arguments as unsupported). - Directly using our drop-in replacement APIs has a small performance penalty (due to the library handle not reused).
- We currently only support classical einsum; ellipsis, broadcasting, etc are not yet supported
Below is my 4-step proposal, involving a small refactoring of the existing codebase:
- Add
cupy.einsum_path()
as a fallback: The functionalities are already incupy/linalg/_einsum.py
, we just need to put everything together. - Introduce
ACCELERATOR_CUQUANTUM
as a optional routine accelerator backend: This way, we allow users to setCUPY_ACCELERATORS=cub,cutensor,cuquantum
. This is desired to be able to test things separately (ex: see here). - Use the core functionalities of cuTensorNet to back
cupy.einsum_path()
andcupy.einsum()
: I don’t think I’d like to usecuquantum.einsum_path()
andcuquantum.einsum()
directly due to the aforementioned reasons, so most likely we’ll need to add some helper functions to check- If
ACCELERATOR_CUQUANTUM
is requested - If
cuquantum
can be imported - If all user inputs can be handled by cuTensorNet
- If so, forward to cuQuantum Python, otherwise fall back
- If
- Add some mock tests to ensure the backend can be used when available
Additional Information
No response
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:10 (10 by maintainers)
Top GitHub Comments
Thanks for the detailed explanation @leofang! I wasn’t aware of this symlink issue - that seems like a pretty serious design problem with wheels indeed.
Ingenious, but not something that anyone should be using right now for production-quality releases it looks like.
@kmaehashi Well, we hit unexpected delay in the wheel release so we decided to skip cuQuantum v0.1.0 + cuTENSOR 1.4.0, and do cuQuantum v1.0.0 + cuTENSOR 1.5.0 directly. cuTENSOR wheel is up: https://pypi.org/project/cutensor/
(btw, given the lack of symlink support in wheels it’s a bit awkward to build a project against the cuTENSOR wheel (do pip install yourself to see what I mean), so I am not sure if it’d be useful for CuPy… Also notice the wheel size…)