Using `-Xfatbin=-compress-all` for kernel compilation
See original GitHub issueNot sure how relevant this is for how CuPy compiles things, but something we have been exploring in RAPIDS that may be of interest is building with -Xfatbin=-compress-all
. In some cases this can make for quite dramatic reductions in compiled kernel sizes. For an example please see PR ( https://github.com/rapidsai/cudf/pull/7583 ). Again not sure if this is relevant for how CuPy is building things, but just wanted to mention it in case it is.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:5 (5 by maintainers)
Top Results From Across the Web
How to compile and install Linux Kernel 5.16.9 from source code
This page explains how to compile and install Linux Kernel 5.16.9 running on modern Linux distro such as Fedora, Ubuntu, Debian and others....
Read more >Kernel/Traditional compilation - ArchWiki - Arch Linux
This article is an introduction to building custom kernels from kernel.org sources. This method of compiling kernels is the traditional method common to...
Read more >How to Compile a Linux Kernel
How to Compile a Linux Kernel · The first thing to do is download the kernel source file. · Change into the newly...
Read more >8.10. Compiling a Kernel
Getting the Sources. Like anything that can be useful on a Debian system, the Linux kernel sources are available in a package. To...
Read more >Kernel/BuildYourOwnKernel - Ubuntu Wiki
This page describes how to build the kernel. The majority of users that are interested in building their own kernel are doing so...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks for sharing this! I tried this locally (my environment uses
export CUPY_NVCC_GENERATE_CODE=arch=compute_60,code=sm_60;arch=compute_70,code=sm_70;arch=compute_80,code=sm_80
) and measureddu -csh cupy cupyx cupy_backends
:In CuPy it seems the effect is limited because
nvcc
is only used for a few modules (cub/thrust/random).Moreover, after compressing files into a wheel (ie ZIP), the compressed one was 262 KB larger.
It’s another 15-25M for the
cub
module and I didn’t check other modules. I don’t see the build time increased, and the runtime overhead for decompression is negligible based on an internal discussion (though likely outdated). For reference, all RAPIDS binaries are built with this option.Though, to be fair, it does not always help. For example, with this feature I see
cub
inflates to 190M+, and the compression only brings it down to 170M+, not a significant change.