Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsistent Model deploy & run performance

See original GitHub issue

Description

Hi! I’m deploying a YOLOv5 object detector model in Java via DJL. My code for deploying is very similar to the code provided by this guide: https://docs.djl.ai/jupyter/load_pytorch_model.html.

I’m deploying the same model via the same code across two different machines. The machines are spec-ed as follows:

Macbook Pro 15" 2019: i9 9980HK, 32GB RAM, AMD 560 Graphics Card. Has integrated Intel graphics.
Fedora PC: i5 12600K (30% faster than the i9), 16GB RAM, no graphics card. No integrated graphics either. Has integrated Intel graphics.

The time to run the model is 25ms on the Macbook Pro and 800ms on the i5. This is a 30x difference.

If I understand correctly, the AMD graphics card does not have CUDA, so it’s not used by DJL. I also confirmed that my GPU usage doesn’t spike in usage when I run my script. So both machines are using CPU. I don’t understand how the faster i5 processor can be 30x slower. Is it possible that the lack of the Intel integrated graphics is slowing down the i5?

Thank you!

Expected Behavior

Model run time should be largely consistent

Error Message

n/a

How to Reproduce?

this.pipeline.add(new Resize(OBJECT_DETECTION_FRAME_DIMENSION));
this.pipeline.add(new ToTensor());

this.translator = YoloV5Translator.builder().setPipeline(this.pipeline).optSynset(this.LABELS).build();

this.criteria = Criteria.builder().setTypes(Image.class, DetectedObjects.class)
.optModelPath(Paths.get(MODEL_DIRECTORY)).optProgress(new ProgressBar()).optTranslator(this.translator)
.build();

this.model = criteria.loadModel();
this.predictor = this.model.newPredictor();

Steps to reproduce

n/a

What have you tried to solve it?

n/a

Environment Info

Issue Analytics

State:
Created 2 years ago
Comments:8 (4 by maintainers)

Top GitHub Comments

1reaction

frankfliucommented, Jan 13, 2022

@davpapp For the image processing performance, you can use DJL opencv extension. The JPEG decoding with opencv is much faster than Java ImageIO. see: https://github.com/deepjavalibrary/djl/tree/master/extensions/opencv

1reaction

frankfliucommented, Jan 12, 2022

You might want to run a bit more iterations:

djl-bench -e PyTorch -p /home/ubuntu/models/pytorch/yolo5/yolo5.pt -s 1,3,224,224 -c 500

But it looks like your model’s latency is around 600 ms on CPU. PyTorch doesn’t use MKL by default, that’s might be a reason why it’s slow on linux. PyTorch on CUDA should be a lot faster. If you get this number with djl-bench, most likely it’s what you can get with pytorch engine. You can try enable mkldnn, but I’m not sure if that works for your model:

        System.setProperty("ai.djl.pytorch.use_mkldnn", "true");

Top Results From Across the Web

Why Do I Get Different Results Each Time in Machine Learning?

The impact is that each time the stochastic machine learning algorithm is run on the same data, it learns a slightly different model....

Deploying a Model - Christian Kästner - Medium

Using an inconsistent encoding between training and inference is dangerous and can lead to wrong predictions as the runtime data does not match...

Saved model behaves differently on different machines #7676

If I run my code twice, I get the exact same results. The problem is that the results are reproducible only on the...

Improving your Machine Learning Model Performance is ...

Model performance is an assessment of the model's ability to perform a task accurately not only with training data but also in real-time...

Troubleshoot pipeline runs - Azure DevOps - Microsoft Learn

If you are experiencing intermittent or inconsistent MSBuild failures, try instructing MSBuild to use a single-process only. Intermittent or ...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Inconsistent Model deploy & run performance

Description

Expected Behavior

Error Message

How to Reproduce?

Steps to reproduce

What have you tried to solve it?

Environment Info

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Tokenizers maven dependency fails to compile on CentOS 7

No matching model with specified Input/Output type found.