Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CUDA/FPGA compilation errors

See original GitHub issue

I’m trying to apply some transformations for GPU/FPGA device but I’m getting some errors:

any ideas how can I fix it? I have installed cuda toolkit via sudo apt install, my gcc --version is 9 But I don’t know where should I link it for dace to use it ( I can’t find gcc in ~/.dace.conf) dace.txt

Thanks for help.

#Edit After changing: for index in dace.map[0:size]: to @dace.map(_[0:size]) def fun(index): FPGA is working normally. Cuda still fails.

Issue Analytics

State:
Created 2 years ago
Comments:11 (5 by maintainers)

Top GitHub Comments

1reaction

definelichtcommented, Dec 30, 2021

Yes, I have noticed that Xilinx creates extra buffers when using burst. But without minimal local array’s there will be no burst, maybe add some micro buffering between memlets like you said streaming transformations?

Xilinx detects accesses to adjacent indices in consecutive loop iterations and infers burst accesses. Local buffers are not required. For example:

void Foo(int const *from_dram, hlslib::Stream<int> &s, int n) {
  for (int i = 0; i < n; ++i) {
    #pragma HLS PIPELINE II=1
    s.Push(from_dram[i]);
  }
}

This will infer bursts of size n, even though it’s just being written to a stream.

I would like to add some specific pragmas in loop, maybe some extra commands that would be inserted into some @dace.program like:

@dace.program
def fun():
/*some code*/
loc_in_pixels = dace.define_local(shape=(burst_size), dtype=dace.uint8, memtype=dace.local) #This creates some local buffer not global!
dace.hint(loc_in_pixels, "#PRAGMA HLS PARTITION VARIABLE=loc_in_pixels COMPLETE") #Place that "text" near loc_in_pixels var definition.

You can achieve complete partitioning of a variable by settings its storage type to FPGA_Registers (for example, sdfg.arrays["loc_in_pixels"].storage = dace.StorageType.FPGA_Registers).

for i in range(10): dace.hint(“#PRAGMA UNROLL”) #Put hint right here in tasklet. Maybe add something specific like

We support map unrolling by simply setting unroll=True on the map object. This might also be supported in the map exposed in the frontend?

1reaction

definelichtcommented, Dec 28, 2021

I have noticed that: @dace.map def calc_mask(pix: _[0:size]): generated loop isn’t unrolled but why?

I don’t think we currently do any automatic unrolling. You can unroll it manually if you wish! @TizianoDeMatteis @alexnick83 we could think about automatically unrolling loops with constant loop indices that only access local memory.

loc_in_pixels = dace.define_local(shape=(burst_size), dtype=dace.uint8) it is global array by default (should it be that way?)

Yes, all arrays are global by default, but can easily be changed to be local memories.

Top Results From Across the Web

Compilation errors in a CUDA C project (nvcc compiler)

This was caused by using a variable in a kernel whose name conflicted with a reserved keyword ( new in this case).

FCUDA: Enabling efficient compilation of CUDA kernels onto ...

Our CUDA-to- source-to-source compilation that transforms the SPMD CUDA FPGA flow employs the state of the art high-level synthesis thread blocks into ...

dace 0.13.3 on PyPI - Libraries.io

If you are running on Mac OS and getting compilation errors when calling DaCe programs, make sure you have OpenMP installed and configured ......

Parallel Programming for FPGAs - Hacker News

Xilinx and Altera have been promising efficient high level C++/openCL/CUDA -> FPGA compilation for decades now and almost everybody I know has been...

Firefly discussion pages

Sat Jul 14 '18 3:05pm , Error about RSURFACE:GRADIENT OUT OF RANGE Fumihito ... Sun Sep 22 '13 10:28pm Re^3: AVX, OpenCL, CUDA,...