Avoid exposing functions overlapping with NCCL names
See original GitHub issueCurrently CuPy exposes NCCL stub functions that overlap with those that NCCL itself provides. This is done depending on whether a CUDA enabled build of CuPy was produced. Also which functions are exposed depends on which version of NCCL was used in the CuPy build. If the latest version of NCCL (supported by CuPy) was not available during the build, CuPy will expose some functions with names that overlap with those in NCCL. This could be an issue if CuPy were built against an older version of NCCL (like 2.3) and installed on a system with a newer version of NCCL (like 2.4). As a result both CuPy and NCCL will have symbols that clash. This would be detrimental to a user’s program (causing crashes).
To fix this issue, it would be helpful if CuPy renamed the stub functions in cupy_nccl.h
with unique names not found in NCCL. A simple strategy might be to prefix all stub functions with cupy_*
. To simplify usability and maintainability, CuPy could make these stub functions either call NCCL or fallback to the some default behavior (like return ncclSuccess
) depending on NCCL version.
Though this is merely one option. There could be other viable options. In any event it would be useful to avoid symbol clashes with NCCL.
Issue Analytics
- State:
- Created 4 years ago
- Comments:7 (7 by maintainers)
Top GitHub Comments
Thanks for pointing this issue out. The same thing can be said to cuDNN as we’re exposing symbols whose names are same as cuDNN depending on build-time cuDNN version, and cudnn is linked as
libcudnn.so.7
(notlibcudnn.so.7.5
).Discussed these ideas in the dev team, and concluded that the idea 2 sounds a reasonable solution. Does anyone interested in working on this issue?
cc/ @pentschev @jekbradbury @anaruse