Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ray Serve performance

See original GitHub issue

Search before asking

I searched the issues and found no similar issues.

Ray Component

Ray Serve

What happened + What you expected to happen

A keras/ Tensorflow model deployed on ray serve, seemed to be working slow compared to general python inference code

Versions / Dependencies

Python: 3.8 Ray: 1.8 OS: Linux

Reproduction script

Ray Serve is slower than a basic python script, I think the issue is with the “request type”, Correct me if am wrong. Deployed a model on ray serve,

I am using a normal python script for requests as below, it takes around 4 minutes to complete the task for around 100 images.

for file in os.listdir(path): #image preprocessing resp = requests.get("http://localhost:9001/predict",json={"array":img.tolist()}) prediction = np.array(resp.json()['prediction'])

Also tried with ray core for parallelising the preprocess and sending request but this too seemed slow

@ray.remote() def send_request(filepath): # image processing return requests.get("http://localhost:9001/predict", json={"array": img.tolist()}) resp = [send_request.remote(filepath) for filepath in os.listdir(path) ] resp = ray.get(resp) --check only for a single img prediction resp = resp[0].json() prediction = np.array(resp['prediction'])

Using ray it took me around 4 mins to process 100 images,

While using simple python script (below) took me around 50 seconds.

for file in os.listdir(path): ## image preprocessing prediction = predict_model.predict(img)

Is this because of the Http requests , if so how can I change it ?

Anything else

Is there a possibility of using gRPC requests for ray serve ?

Are you willing to submit a PR?

Yes I am willing to submit a PR!

Issue Analytics

State:
Created 2 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

jiaodongcommented, Dec 1, 2021

what if I don’t have access to the path to read the image ??

That’s where I would take advantage of shared memory in ray cluster. You can start off by saving preprocessed images into ray cluster first via obj_ref = ray.put(img) then you can pass its binary representation binary = obj_ref.binary() around and reconstructed by obj_ref = ray.ObjectRef(binary) then you can fetch this remote object anywhere in the ray cluster by simply calling img = ray.get([obj_ref]) again.

I would recommend checking out “Object resolution” section in our public whitepaper: https://docs.google.com/document/d/1lAy0Owi-vPz2jEqBSaHNQcy2IBSDEHyXNOQZlGuj93c/edit#heading=h.rtepjltxoeb7

0reactions

stale[bot]commented, Sep 24, 2022

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you’d still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray’s public slack channel.

Thanks again for opening the issue!

Top Results From Across the Web

Performance Tuning — Ray 2.2.0

For throughput, Serve achieves about 3-4k queries per second on a single machine (8 cores) using 1 HTTP proxy actor and 8 replicas...

Performance Tuning — Ray 1.11.0

We are continuously benchmarking Ray Serve. The metrics we care about are latency, throughput, and scalability. We can confidently say: Ray Serve's latency ......

Scalable and Programmable Serving - the Ray documentation

We truly believe Serve is unique as it gives you end-to-end control over your ML application while delivering scalability and high performance.

Performance Tuning — Ray 2.0.0rc0 - the Ray documentation

We are continuously benchmarking Ray Serve. The metrics we care about are latency, throughput, and scalability. We can confidently say: Ray Serve's latency ......

Understanding differences in performance for Ray.remote vs ...

I'm working on a small research project to investigate differences in performance for similar workloads on Ray Serve vs simply scheduling ...