MonteCarloDynamic kernel failing on the Xilinx FPGA
See original GitHub issueDescribe the bug I noticed a problem with the Montecarlo kernel in the dynamic package for all the sizes, when executing on the Xilinx KCU1500 FPGA. There is no error in the compilation, but the kernel does not finish and it causes failures at the driver level regarding the dma. The problem seems like this:
[ 815.440478] xocl:engine_status_dump: SG engine 0-H2C1-MM status: 0x00000000:
[ 815.440480] xocl:engine_status_dump: SG engine 0-H2C0-MM status: 0x00000001: BUSY
[ 815.440483] xocl:transfer_abort: abort transfer 0x000000009584ae00, desc 11, engine desc queued 0.
[ 815.440487] xocl:transfer_abort: abort transfer 0x00000000d2360335, desc 1, engine desc queued 0.
[ 815.440505] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: DMA failed, Dumping SG Page Table
[ 815.440508] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: DMA failed, Dumping SG Page Table
[ 815.440516] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 0, 0xf3ce7c000
[ 815.440521] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 1, 0xf3d800000
[ 815.440526] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 2, 0xf3d400000
[ 815.440531] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 3, 0xf3f000000
[ 815.440536] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 4, 0xf7d000000
[ 815.440540] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 5, 0xf4f800000
[ 815.440545] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 6, 0xf54800000
[ 815.440550] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 7, 0xf60400000
[ 815.440554] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 8, 0xf61c00000
[ 815.440559] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 9, 0xf3b800000
[ 815.440568] xocl_mm_xdma mm_dma.v5.u.256: xdma_migrate_bo: 0, 0xf3821f000
This problem occurs only on the Xilinx KCU1500 FPGA. The Intel Nallatech Arria 10 FPGA is working both in emulation mode and the other two modes (Full Jit and AoT).
So, I did some work around and compared the previous kernel that was working (about 2 months old) and the current one. I took the body of the old kernel and applied two changes that we introduced in the latest version: a) altered the number regarding the frame number from 6 to 0. b) removed the private region parameter.
The modified kernel seems to be working. So, the main difference between the two kernels is shown in the figure (Left kernel is the old one that is working, Right kernel is the new one that causes the problem):
How To Reproduce
tornado -Ds0.t0.device=0:1 -Xmx20g -Xms20g --printKernel --debug uk.ac.manchester.tornado.examples.dynamic.MontecarloDynamic 65536 default 1
Note that device 0:1 is the xilinx_kcu1500_dynamic_5_0 CL_DEVICE_TYPE_ACCELERATOR
Computing system setup (please complete the following information):
- OS: Ubuntu 18.04.02 LTS
- OpenCL Version: 1.0
- TornadoVM commit id: ed243aa
Any ideas? I am not familiar with this change about the fma
.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
Thank Thanos. You can report this issue to the Xilinx OpenCL runtime.
Thank @stratika. Do you think the issue is the
FMA
instruction? This is supported from OpenCL 1.0https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/fma.html
Can you substitute the
fma
to use separate instructions instead? Just to double-check that is the problem.Apart from that, the changes: a) OpenCL frame: should not affect b) Private memory allocation for arrays: this might cause a problem is we get out of resources. But IMO, we should get an error after the kernel launch.