question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Compilation time is inconsistent between different environments

See original GitHub issue

Description

I’m working on a project that uses CuPy to accelerate quantum computing simulations. We employ CuPy with custom CUDA kernels loaded with RawModule. We are trying to benchmark our code to assess the JIT approach with respect to alternatives. During benchmarks, we found out that the compilation times are inconsistent between different versions of CuPy and CUDA.

For example, I’ve written a gist https://gist.github.com/mlazzarin/1e2128e90d78c4cb1a220075f64bc297 that loads our custom kernels with cp.RawModule and compile them with .compile() method. I tried to run such example with different versions of CuPy and CUDA toolkit:

Environment Compilation time
System installation of CUDA 11.5, cupy-cuda115 from pip ~ 3.2 s
cudatoolkit=11.5.0 and cupy=9.6.0 from conda-forge ~ 3.2 s
cudatoolkit=11.4.2 and cupy=9.6.0 from conda-forge ~ 3.2 s
cudatoolkit=11.3.1 and cupy=9.6.0 from conda-forge ~ 3.2 s
cudatoolkit=11.2.2 and cupy=9.6.0 from conda-forge ~ 3.4 s
cudatoolkit=11.1.1 and cupy=9.6.0 from conda-forge ~ 1.6 s
cudatoolkit=11.5.0 and cupy=9.5.0 from conda-forge ~ 2.1 s (3.2 s first exec)
cudatoolkit=11.4.2 and cupy=9.5.0 from conda-forge ~ 2.1 s (3.2 s first exec)
cudatoolkit=11.3.1 and cupy=9.5.0 from conda-forge ~ 2.1 s (3.2 s first exec)
cudatoolkit=11.2.2 and cupy=9.5.0 from conda-forge ~ 2.3 s (3.5 s first exec)
cudatoolkit=11.1.1 and cupy=9.5.0 from conda-forge ~ 1.0 s (1.7 s first exec)

We also found out that the first execution with CuPy v9.5.0 is slower than the following ones, while this doesn’t happen with CuPy v9.6.0. This holds for the very first execution in a new environment.

Is this expected?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (8 by maintainers)

github_iconTop GitHub Comments

2reactions
kmaehashicommented, Nov 24, 2021

RawModule and RawKernel are not cached by cupy when using the nvrtc backend.

No that’s not true, they are cached when things go through cupy/cuda/compiler.py.

The only situation disk cache in CuPy does not work is nvrtc backend AND name_expressions are specified.

The only mystery I am still wondering is the mismatch between CUDA 11.1 and later CUDA versions.

I know that nvcc comes with CUDA 11.1 behaves differently than the one that comes with 11.0 and 11.2+. Interestingly, nvcc 11.1 builds faster but generates a larger binary than others. I m unsure this applies to NVRTC, but I guess so according to the benchmark.

@mlazzarin Anyway, the codepath in CuPy is the same between CUDA 11.1 and onwards, so I think this is unlikely a CuPy bug. I’d suggest to forget about past CUDA releases 😃

0reactions
mlazzarincommented, Nov 26, 2021

Ok, thank you very much!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fix the top 10 most common compile time errors in Java
Compile time errors in Java can be confusing and frustrating. Be prepared with this list of the 10 most common Java compile errors...
Read more >
The environment is inconsistent, please check the package ...
A relatively old version of Anaconda, where I have a separate working environment with updated versions of the packages as I need them....
Read more >
Why does my same C code produce inconsistent results when ...
I have written some sorting algorithm. For the sake of simplicity I have chosen a small array with unique values. Whenever I compile...
Read more >
Build C++ from source: Part 1/N - Improving compile times
We will come back to what a consistent environment is. ... I will focus on presenting various solutions that can improve c++ build...
Read more >
Inconsistent Test Results - Possibly Due to Shared Load?
Our build times are “consistently” inconsistent. We honed our build configuration for years: to properly split tests into parallel workers ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found