question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`cupy.cumsum()` doesn't work in ROCm 5.0

See original GitHub issue

Rel #6459.

cupy.cumsum() that CuPy independently implements without calling CUB does not work in ROCm 5.0:

$ python -c 'import cupy; print(cupy.cumsum(cupy.array([0,0])))'
[                  0 1304722626565153334]

The implementation uses shared memory to communicate across threads. As discussed in #4366, in ROCm, threads in a warp run in lock-step at all time so synchronizing instructions in a warp is not required, but we still need to use memory fence to enforce the ordering on access to shared memory.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
takagicommented, Mar 1, 2022

No, I understand it doesn’t. In the following part of cumsum, for example, the data stored by thread 0 is observed being written by thread 1 is not necessarily before thread 1 observes the memory by the load instruction. https://github.com/cupy/cupy/blob/812b0f5301de8896f105ed974d84b03fcb331d91/cupy/_core/_routines_math.pyx#L307-L309

I couldn’t find its exact explanation in ROCm documentation, but CUDA tells that here: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#memory-fence-functions

1reaction
emcastillocommented, Mar 1, 2022

Thanks! If hip supports memory fences maybe it is better to redefine it as such …, it will keep the semantics since the warp advances in lock-step

Read more comments on GitHub >

github_iconTop Results From Across the Web

Support ROCm 5.0 · Issue #6459 · cupy/cupy - GitHub
The half float representation issue which we were facing on ROCm 4.5 looks gone. With the change in #6466, I'm running CuPy's test...
Read more >
latest PDF - CuPy Documentation
CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in replacement to run existing NumPy/SciPy ...
Read more >
Release Notes - Numba documentation - Read the Docs
Version 0.52.0 (30 November, 2020)¶. This release focuses on performance improvements, but also adds some new features and contains numerous bug fixes and ......
Read more >
CS312 Course Introduction - UT Computer Science
2 years Round Rock ... 6 reading assignments, 5 points each, 30 points total ... struggles with the loops, logic, etc. does not...
Read more >
Phase 2 Report - Review Copy, Further Site Characterization ...
3-124 3.3.5 Estimated Historical Water Column Loadings Based on USGS ... recent work included construction of an inclined borehole through rock in a ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found