question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MonteCarloDynamic kernel failing on the Xilinx FPGA

See original GitHub issue

Describe the bug I noticed a problem with the Montecarlo kernel in the dynamic package for all the sizes, when executing on the Xilinx KCU1500 FPGA. There is no error in the compilation, but the kernel does not finish and it causes failures at the driver level regarding the dma. The problem seems like this:

[  815.440478] xocl:engine_status_dump: SG engine 0-H2C1-MM status: 0x00000000:
[  815.440480] xocl:engine_status_dump: SG engine 0-H2C0-MM status: 0x00000001: BUSY
[  815.440483] xocl:transfer_abort: abort transfer 0x000000009584ae00, desc 11, engine desc queued 0.
[  815.440487] xocl:transfer_abort: abort transfer 0x00000000d2360335, desc 1, engine desc queued 0.
[  815.440505] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: DMA failed, Dumping SG Page Table
[  815.440508] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: DMA failed, Dumping SG Page Table
[  815.440516] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 0, 0xf3ce7c000
[  815.440521] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 1, 0xf3d800000
[  815.440526] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 2, 0xf3d400000
[  815.440531] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 3, 0xf3f000000
[  815.440536] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 4, 0xf7d000000
[  815.440540] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 5, 0xf4f800000
[  815.440545] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 6, 0xf54800000
[  815.440550] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 7, 0xf60400000
[  815.440554] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 8, 0xf61c00000
[  815.440559] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 9, 0xf3b800000
[  815.440568] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 0, 0xf3821f000

This problem occurs only on the Xilinx KCU1500 FPGA. The Intel Nallatech Arria 10 FPGA is working both in emulation mode and the other two modes (Full Jit and AoT).

So, I did some work around and compared the previous kernel that was working (about 2 months old) and the current one. I took the body of the old kernel and applied two changes that we introduced in the latest version: a) altered the number regarding the frame number from 6 to 0. b) removed the private region parameter.

The modified kernel seems to be working. So, the main difference between the two kernels is shown in the figure (Left kernel is the old one that is working, Right kernel is the new one that causes the problem): montecarlo_kernels diff

How To Reproduce tornado -Ds0.t0.device=0:1 -Xmx20g -Xms20g --printKernel --debug uk.ac.manchester.tornado.examples.dynamic.MontecarloDynamic 65536 default 1

Note that device 0:1 is the xilinx_kcu1500_dynamic_5_0 CL_DEVICE_TYPE_ACCELERATOR

Computing system setup (please complete the following information):

  • OS: Ubuntu 18.04.02 LTS
  • OpenCL Version: 1.0
  • TornadoVM commit id: ed243aa

Any ideas? I am not familiar with this change about the fma.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
jjfumerocommented, Jul 1, 2020

Thank Thanos. You can report this issue to the Xilinx OpenCL runtime.

1reaction
jjfumerocommented, Jul 1, 2020

Thank @stratika. Do you think the issue is the FMA instruction? This is supported from OpenCL 1.0

https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/fma.html

Can you substitute the fma to use separate instructions instead? Just to double-check that is the problem.

Apart from that, the changes: a) OpenCL frame: should not affect b) Private memory allocation for arrays: this might cause a problem is we get out of resources. But IMO, we should get an error after the kernel launch.

Read more comments on GitHub >

github_iconTop Results From Across the Web

I am getting a kernel crash. What can I do to resolve it ?
I am getting a kernel crash with the following traceback on a Xilinx FPGA with embedded ARM core. This is with kernel 5.4.0-xilinx-v2020.2...
Read more >
FPGA Manager programming periodically fails - Xilinx Support
Hi all, I am working with a custom RFSoC board. I am wanting to make use of programming the PL at runtime using...
Read more >
Kernel error when running an application tests on ZCU104 ...
Hello, I am running some application tests on the hardware accelerator prototyped on ZCU104 FPGA. The hardware design was created in Vivado ......
Read more >
Typical Errors Leading to Application Hangs - 2021.2 English
This is typically a user error. For example, this error might occur when a kernel is expected to write 4 KB of data...
Read more >
I'm getting "ERROR: [v++ 60-599] Kernel compilation failed ...
I'm getting "ERROR: [v++ 60-599] Kernel compilation failed to complete" during the compilation of my HLS kernel's .xo.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found