Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Adding asynchronous execution to TaskSchedule

See original GitHub issue

Currently TaskSchedule API contains only blocking versions of execute methods:

void execute() ;
void execute(GridTask gridTask);
void executeWithProfiler(Policy policy);
void executeWithProfilerSequential(Policy policy);
void executeWithProfilerSequentialGlobal(Policy policy);

All these methods block currently executing Java thread till all computations are done. However, computations are typically off-loaded to GPU and CPU at this moment just keeps waiting.

I think it should be both beneficial and possible to add asynchronous versions of the same methods with the following signatures:

CompletableFuture executeAsync() ;
CompletableFuture executeAsync(GridTask gridTask);
CompletableFuture executeWithProfilerAsync(Policy policy);
CompletableFuture executeWithProfilerSequentialAsyn(Policy policy); // Not sure about "sequential async"
CompletableFuture executeWithProfilerSequentialGlobalAsync(Policy policy); // Not sure about "sequential async"

Thoughts about implementation

Per my understanding, TaskSchedule delegates back to TornadoTaskSchedule

And this objects waits on Event event object (driver-specific). There are specific classes in each driver- CLEvent and PTXEvent

OpenCL provides clSetEventCallback, CUDA has cudaLaunchHostFunc – so it’s possible to get async notifications from both OpenCL and PTX drivers.

So it should be possible to extend CLEvent and PTXEvent + PTXStream to add some form of listeners, where concrete listener inside TornadoTaskSchedule can settle CompletableFuture returned from the proposed TaskSchedule.executeAsync().

Thought?

Issue Analytics

State:
Created 3 years ago
Comments:24 (24 by maintainers)

Top GitHub Comments

1reaction

vsilaevcommented, Jan 26, 2021

There are two main points: At the task-schedule level and the TornadoVM level

Yep, found it. There are were many other issues that I had to fix when trying to add true out-of-order execution (even inside JNI code - blocking wait for read/write). Attached is a patch with fixed runtime + OpenCL, and an API for executeAsync(...)

Currently with OpenCL all tests run ok with both -Dtornado.ooo-execution.enable=true (ooo) and -Dtornado.ooo-execution.enable=false (partly blocking) on NVIDIA OpenCL.

On Intel OpenCL everything is ok for -Dtornado.ooo-execution.enable=false but with -Dtornado.ooo-execution.enable=true I get Segmentation Fault for just everything. Need someone who can debug code and find out the reason.

It would be great if anyone will apply this patch to current develop branch and run tests (for both settings of tornado.ooo-execution.enable) on AMD or other device.

0001-Enable-full-Out-of-order-execution-Adding-executeAsy.zip

0reactions

vsilaevcommented, Feb 2, 2021

Status update.

Obviously, just keeping reference to avoid GC-ing buffer array doesn’t help - its location in memory may be changed during GC.
I rewrite OCLCommandQueue to use DirectByteBuffer for non-blocking calls and left existing “array-copying” code only for blocked read/write.
Fixed issues with events processing in TornadoVM and OCLTornadoDevice
Fixed issues in OCLEvent with static buffer usage
Added OpenCL 1.1 compatible version of enqueue barrier / marker that works exactly the same as for OpenCL 1.2 (OCLCommandQueue and OCLDeviceContext).
Finally, added code for CompletableFuture TaskSchedule.executeAsync(...)

Tested on both NVIDIA and AMD. Almost all of the tests – ~325 out of 331 – run ok in blocking, default (mixed) and non-blocking modes. The only exceptions are:

        Running test: testVectorChars            ................  [FAILED]
                \_[REASON] expected:<102> but was:<33126>

Sporadically happens on both AMD and NVIDIA in any mode (default, blocking, non-blocking).

  102 = 0x0066
33126 = 0x8166

I guess this is smth. related to handling 2 bytes types. From tests I saw that you implemented special handling for single-byte type. Probably, two-bytes should be addressed as well. Because it’s always some crap in one byte and correctly set second byte.

        Running test: testProfilerEnabled        ................  [FAILED]
                \_[REASON] null

Always happens on both AMD and NVIDIA in non-blocking mode. Blocking and default modes are ok.

        Running test: testComputePi              ................  [FAILED]
                \_[REASON] expected:<3.14> but was:<5.1518049240112305>

Always happens on AMD (any mode). Need to check on Tornado 0.8 – probably this is not my regression at all.

And the asynchronous invocation itself (i.e. Tornado.executeAsync(...) works as expected.

Waiting for your approval of my previous PR, so I’ll share these results.

Top Results From Across the Web

windows - Asynchronous Task Scheduler actions - Super User

I want to run a few programs / files on logon, so I'm using Task Scheduler, rather than the Startup folder or scripts,...

Asynchronous task within scheduler in Java Spring

Executing scheduled tasks asynchronously. To execute scheduled tasks asynchronously you can use Spring's @Async annotation (and make sure to @ ...

25.5 Annotation Support for Scheduling and Asynchronous ...

The @Async annotation can be provided on a method so that invocation of that method will occur asynchronously. In other words, the caller...

Consuming the Task-based Asynchronous Pattern

This task scheduler determines whether the awaited asynchronous operation should resume where it completed or whether the resumption should ...

Asynchronous and scheduled tasks in Spring - Waiting For Code

Simple asynchronous tasks are executed in background, ... Both can be activated by adding @EnableScheduling and @EnableAsync annotations to ...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Adding asynchronous execution to TaskSchedule

Thoughts about implementation

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Printing VectorFloat3 of size <4 throws IllegalFormatConversionException

MonteCarloDynamic kernel failing on the Xilinx FPGA