[RFC] Use RAPIDS cuGraph as the backend for CuPy graph algorithms
See original GitHub issue@pentschev suggested that @BradReesWork and I post the following RFC to the CuPy community. Please don’t hesitate to comment and let us know if there’s anything else you’d like to see.
Introduction
CuPy has recently started the process of adding support for various graph algorithms that operate on CuPy sparse matrices, such as seen in PR https://github.com/cupy/cupy/pull/4054. This is a great, however, it raises multiple questions, the primary one being whether CuPy should write the code or simply leverage what is available within RAPIDS cuGraph. This is especially relevant because RAPIDS cuGraph is scheduled to have full CuPy Sparse matrix support in their 0.17 release, currently slated for December 9th, to provide cuGraph users the additional flexibility of allowing CuPy sparse matrices to be used in cuGraph workflows without requiring additional conversion code.
Advantages and Disadvantages
The obvious advantage to leveraging RAPIDS cuGraph is that it would give CuPy users access to a broad range of graph algorithms with minimal effort on CuPy developers to get cuGraph wrapped. RAPIDS cuGraph will have over twenty algorithms supporting CuPy sparse matrix as input with additional algorithms on their roadmap.
Writing efficient graph algorithms is time consuming. Beyond the initial implementation, there is periodic maintenance and performance enhancements, not to mentioned possible bug fixing. RAPIDS cuGraph has a dedicated team to doing nothing but graph algorithms and ensuring the best performance and interoperability.
The one disadvantage is that it is an external library that has additional dependencies.
Proposal
As @pentschev mentioned in his comment https://github.com/cupy/cupy/pull/4054#issuecomment-704359692 , CuPy can be updated with thin wrappers to cuGraph APIs in order to provide the desired graph algorithms to CuPy users. This approach is mutually beneficial in that it will allow CuPy to quickly add many graph algorithms without significant additional implementation, testing, and maintenance costs, while also benefitting cuGraph through exposure to more users and use cases.
For example, to implement a SciPy-compatible connected components algorithm, as described here, and called by CuPy users like this:
cupyx.scipy.sparse.csgraph.connected_components(csgraph, connection='weak')
CuPy could create a thin wrapper to adapt the desired interfaces to one or more cuGraph calls. Since cuGraph will be adding support for passing CuPy sparse matrices directly in the 0.17 release, the wrapper is even simpler:
import cugraph
def connected_components(csgraph, connection='weak'):
if connection == 'weak':
return cugraph.components.connectivity.weakly_connected_components(csgraph)
else:
return cugraph.components.connectivity.strongly_connected_components(csgraph)
cuGraph algorithms that are passed CuPy matrices as inputs will also be updated to return CuPy-based types where appropriate. This convention of “return type matches input type” by default has already been established in the 0.16 release when support for passing NetworkX graph objects was added, however, an optional argument to allow the caller to specify a different return type may also be provided in 0.17.
Questions
- What input and output types and/or other parameters would the CuPy community need from the cuGraph API?
- Is the CuPy community interested in providing their users the SciPy sparse API as described here: https://docs.scipy.org/doc/scipy/reference/sparse.csgraph.html or are there other preferred alternatives?
- Are there other issues/concerns that aren’t exposed here that we should consider before starting with the implementation?
Issue Analytics
- State:
- Created 3 years ago
- Reactions:7
- Comments:23 (13 by maintainers)
Top GitHub Comments
Ideally, it’d still be nice to have
pylibcugraph
appear on conda-forge so that we can userun_constrained
to make it an optional runtime dependency. But I guess simply documenting it might also work.I’m reopening this to discuss
pylibcugraph
as a new cugraph interface that CuPy could use for this. I can open a new RFC instead if that’s more appropriate, just let me know.Background
cuGraph is planning on releasing a new conda package as part of the 21.10 release (early October 2021) named
pylibcugraph
, which is a thin python interface on the corelibcugraph
C++ library.pylibcugraph
will only depend onlibcugraph
(it will not depend on cuDF or dask) and is intended to be used by integrators looking to add GPU-accelerated graph algorithms to their application or library.As detailed in this thread above, CuPy currently implements
cupyx.scipy.sparse.csgraph.connected_components()
using a cython interface to thelibcugraph
C++ library. This requires a build-time dependency onlibcugraph
andRMM
, cython code, and additional conda-forge packages. Each additional graph algorithm that CuPy chooses to use to extend their API requires additional cython code. This was proposed as a way to leverage cugraph without the added dependencies that the cugraph python package requires all users install, which may not acceptable to many CuPy users.New Proposal
Instead of writing cython interfaces for the
libcugraph
C++ APIs, CuPy will be able to add a soft dependency onpylibcugraph
. This is similar to the original proposal from November 2020 to use the cugraph python package, before the realization that the cugraph dependencies made depending on it unacceptable.By using
pylibcugraph
, CuPy can remove the cython code written forlibcugraph
with python code to implementcupyx.scipy.sparse.csgraph.connected_components()
that simply calls intopylibcugraph
from python. The soft dependency mechanism will allow CuPy to use atry:
…except ModuleNotFoundError
to determine if the cugraph-backed APIs are supported or not at runtime. This also allows CuPy to remove the build-time requirement on thelibcugraph
andRMM
packages from conda-forge, which should simplify the build and the need for continued maintenance on those packages.@kmaehashi , @leofang - please let me know if I missed anything. We should have early
pylibcugraph
builds ready in the cugraph 21.10 nightlies in the next few weeks, and I can provide examples when we get an initial API finalized.cc @pentschev @jakirkham @BradReesWork