question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

GPU actors slower than Serial execution

See original GitHub issue

Hi, I want to use Ray to parallelize a GPU job, by using 3 GPU actors, as follows.

However, the result does not look good. It is even much slower than a serial single-GPU execution.

import psutil
import ray
import time
import numpy
import random

import face_alignment
from skimage import io


num_cpus = psutil.cpu_count(logical=False)
ray.init(num_cpus=num_cpus, num_gpus=3)

@ray.remote(num_gpus=1)
class GPUActor(object):
    def __init__(self):
        self.fa = face_alignment.FaceAlignment(
                face_alignment.LandmarksType._2D,
                flip_input=False, device='cuda')

        str = "This function is allowed to use GPUs {}.".format(ray.get_gpu_ids())
        print(str)
        self.preds = 0

    def proc(self, img):
        self.preds += self.fa.get_landmarks(img)[0].mean()

    def get_preds(self):
        return self.preds


gpuactors = [GPUActor.remote() for _ in range(3)]

begin = time.time()
for _ in range(20 * 3):
    image = io.imread('aflw-test.jpg')
    img_id = ray.put(numpy.array(image))
    gpuactors[_ % 3].proc.remote(img_id)

res = ray.get([actor.get_preds.remote() for actor in gpuactors ])
print(res)
end = time.time()
print('time', end- begin)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7

github_iconTop GitHub Comments

1reaction
Arjunsankarlalcommented, Sep 30, 2020

I actually found the answer in my case, and that was actually a pretty simple fix! 😅 After I set torch.set_num_threads(num_cpus)in the init function I was actually able to get the speed boost I was expecting.

Thanks for the wonderful library!

0reactions
stale[bot]commented, Feb 23, 2021

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you’d still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray’s public slack channel.

Thanks again for opening the issue!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why is this code ten times slower on the GPU than CPU?
The GPU code is ten times slower than the CPU equivalent because the GPU code exhibits a perfect storm of performance-wrecking characteristics.
Read more >
Tips for first-time users — Ray 2.2.0
Surprisingly, not only Ray didn't improve the execution time, but the Ray program is actually slower than the sequential program! What's going on?...
Read more >
10x Faster Parallel Python Without Python Multiprocessing
In these benchmarks, Ray is 10–30x faster than serial Python, 5–25x faster than multiprocessing, and 5–15x faster than the faster of these ...
Read more >
Ray Tutorial | A Quest After Perspectives
This script is too slow, and the computation is embarrassingly parallel. In this exercise, you will use Ray to execute the functions in ......
Read more >
CUDA slower than serial implementation fill Operation on ...
And like other's mentioned. use larger image to highlight the difference between cpu and GPU. In theory, your code should be almost as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found