question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[RFC] Use RAPIDS cuGraph as the backend for CuPy graph algorithms

See original GitHub issue

@pentschev suggested that @BradReesWork and I post the following RFC to the CuPy community. Please don’t hesitate to comment and let us know if there’s anything else you’d like to see.


Introduction

CuPy has recently started the process of adding support for various graph algorithms that operate on CuPy sparse matrices, such as seen in PR https://github.com/cupy/cupy/pull/4054. This is a great, however, it raises multiple questions, the primary one being whether CuPy should write the code or simply leverage what is available within RAPIDS cuGraph. This is especially relevant because RAPIDS cuGraph is scheduled to have full CuPy Sparse matrix support in their 0.17 release, currently slated for December 9th, to provide cuGraph users the additional flexibility of allowing CuPy sparse matrices to be used in cuGraph workflows without requiring additional conversion code.

Advantages and Disadvantages

The obvious advantage to leveraging RAPIDS cuGraph is that it would give CuPy users access to a broad range of graph algorithms with minimal effort on CuPy developers to get cuGraph wrapped. RAPIDS cuGraph will have over twenty algorithms supporting CuPy sparse matrix as input with additional algorithms on their roadmap.

Writing efficient graph algorithms is time consuming. Beyond the initial implementation, there is periodic maintenance and performance enhancements, not to mentioned possible bug fixing. RAPIDS cuGraph has a dedicated team to doing nothing but graph algorithms and ensuring the best performance and interoperability.

The one disadvantage is that it is an external library that has additional dependencies.

Proposal

As @pentschev mentioned in his comment https://github.com/cupy/cupy/pull/4054#issuecomment-704359692 , CuPy can be updated with thin wrappers to cuGraph APIs in order to provide the desired graph algorithms to CuPy users. This approach is mutually beneficial in that it will allow CuPy to quickly add many graph algorithms without significant additional implementation, testing, and maintenance costs, while also benefitting cuGraph through exposure to more users and use cases.

For example, to implement a SciPy-compatible connected components algorithm, as described here, and called by CuPy users like this:

cupyx.scipy.sparse.csgraph.connected_components(csgraph, connection='weak') 

CuPy could create a thin wrapper to adapt the desired interfaces to one or more cuGraph calls. Since cuGraph will be adding support for passing CuPy sparse matrices directly in the 0.17 release, the wrapper is even simpler:

import cugraph 

def connected_components(csgraph, connection='weak'): 
  if connection == 'weak': 
     return cugraph.components.connectivity.weakly_connected_components(csgraph) 
  else: 
     return cugraph.components.connectivity.strongly_connected_components(csgraph) 

cuGraph algorithms that are passed CuPy matrices as inputs will also be updated to return CuPy-based types where appropriate. This convention of “return type matches input type” by default has already been established in the 0.16 release when support for passing NetworkX graph objects was added, however, an optional argument to allow the caller to specify a different return type may also be provided in 0.17.

Questions

  • What input and output types and/or other parameters would the CuPy community need from the cuGraph API?
  • Is the CuPy community interested in providing their users the SciPy sparse API as described here: https://docs.scipy.org/doc/scipy/reference/sparse.csgraph.html or are there other preferred alternatives?
  • Are there other issues/concerns that aren’t exposed here that we should consider before starting with the implementation?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:7
  • Comments:23 (13 by maintainers)

github_iconTop GitHub Comments

2reactions
leofangcommented, Sep 17, 2021

Are there any conventions to handle soft dependency like this in conda-forge? Maybe simply drop dependency to libcugraph and ask users to install conda install -c rapids pylibcugraph via documentation?

Ideally, it’d still be nice to have pylibcugraph appear on conda-forge so that we can use run_constrained to make it an optional runtime dependency. But I guess simply documenting it might also work.

2reactions
rlratzelcommented, Aug 20, 2021

I’m reopening this to discuss pylibcugraph as a new cugraph interface that CuPy could use for this. I can open a new RFC instead if that’s more appropriate, just let me know.

Background

cuGraph is planning on releasing a new conda package as part of the 21.10 release (early October 2021) named pylibcugraph, which is a thin python interface on the core libcugraph C++ library. pylibcugraph will only depend on libcugraph (it will not depend on cuDF or dask) and is intended to be used by integrators looking to add GPU-accelerated graph algorithms to their application or library.

As detailed in this thread above, CuPy currently implements cupyx.scipy.sparse.csgraph.connected_components() using a cython interface to the libcugraph C++ library. This requires a build-time dependency on libcugraph and RMM, cython code, and additional conda-forge packages. Each additional graph algorithm that CuPy chooses to use to extend their API requires additional cython code. This was proposed as a way to leverage cugraph without the added dependencies that the cugraph python package requires all users install, which may not acceptable to many CuPy users.

New Proposal

Instead of writing cython interfaces for the libcugraph C++ APIs, CuPy will be able to add a soft dependency on pylibcugraph. This is similar to the original proposal from November 2020 to use the cugraph python package, before the realization that the cugraph dependencies made depending on it unacceptable.

By using pylibcugraph, CuPy can remove the cython code written for libcugraph with python code to implement cupyx.scipy.sparse.csgraph.connected_components() that simply calls into pylibcugraph from python. The soft dependency mechanism will allow CuPy to use a try:except ModuleNotFoundError to determine if the cugraph-backed APIs are supported or not at runtime. This also allows CuPy to remove the build-time requirement on the libcugraph and RMM packages from conda-forge, which should simplify the build and the need for continued maintenance on those packages.

@kmaehashi , @leofang - please let me know if I missed anything. We should have early pylibcugraph builds ready in the cugraph 21.10 nightlies in the next few weeks, and I can provide examples when we get an initial API finalized.

cc @pentschev @jakirkham @BradReesWork

Read more comments on GitHub >

github_iconTop Results From Across the Web

Welcome to cugraph's documentation! - RAPIDS Docs
RAPIDS cuGraph is a library of graph algorithms that seamlessly integrates into the RAPIDS data science ecosystem and allows the data scientist to...
Read more >
Profile for GitHub - Linknovate
”Our team is delighted to collaborate with NVIDIA to accelerate DGL through RAPIDS cuDF for graph construction, RAPIDS cuGraph for graph sampling and...
Read more >
Simple index - piwheels
... odoo12-addon-account-fiscal-position-usage-group django-istio-opentracing example-pkg-ferman57 emtone xebus-restful-api-client-library genmechanics ...
Read more >
RAPIDS cuGraph – Accelerating all your Graph needs. Brad ...
The relationships between data sets matter. Discovering, analyzing, and learning those relationships is a central part to expanding our ...
Read more >
Simple index
... accretion-cli accretion-common accretion-workers acCRISPR accrocchio accscout accsr accssctrl accsyn-python-api acct acct-backends acctext acctools accu ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found