question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconsistent Model deploy & run performance

See original GitHub issue

Description

Hi! I’m deploying a YOLOv5 object detector model in Java via DJL. My code for deploying is very similar to the code provided by this guide: https://docs.djl.ai/jupyter/load_pytorch_model.html.

I’m deploying the same model via the same code across two different machines. The machines are spec-ed as follows:

  1. Macbook Pro 15" 2019: i9 9980HK, 32GB RAM, AMD 560 Graphics Card. Has integrated Intel graphics.
  2. Fedora PC: i5 12600K (30% faster than the i9), 16GB RAM, no graphics card. No integrated graphics either. Has integrated Intel graphics.

The time to run the model is 25ms on the Macbook Pro and 800ms on the i5. This is a 30x difference.

If I understand correctly, the AMD graphics card does not have CUDA, so it’s not used by DJL. I also confirmed that my GPU usage doesn’t spike in usage when I run my script. So both machines are using CPU. I don’t understand how the faster i5 processor can be 30x slower. Is it possible that the lack of the Intel integrated graphics is slowing down the i5?

Thank you!

Expected Behavior

Model run time should be largely consistent

Error Message

n/a

How to Reproduce?

this.pipeline.add(new Resize(OBJECT_DETECTION_FRAME_DIMENSION));
this.pipeline.add(new ToTensor());

this.translator = YoloV5Translator.builder().setPipeline(this.pipeline).optSynset(this.LABELS).build();

this.criteria = Criteria.builder().setTypes(Image.class, DetectedObjects.class)
.optModelPath(Paths.get(MODEL_DIRECTORY)).optProgress(new ProgressBar()).optTranslator(this.translator)
.build();

this.model = criteria.loadModel();
this.predictor = this.model.newPredictor();

Steps to reproduce

n/a

What have you tried to solve it?

n/a

Environment Info

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
frankfliucommented, Jan 13, 2022

@davpapp For the image processing performance, you can use DJL opencv extension. The JPEG decoding with opencv is much faster than Java ImageIO. see: https://github.com/deepjavalibrary/djl/tree/master/extensions/opencv

1reaction
frankfliucommented, Jan 12, 2022

You might want to run a bit more iterations:

djl-bench -e PyTorch -p /home/ubuntu/models/pytorch/yolo5/yolo5.pt -s 1,3,224,224 -c 500

But it looks like your model’s latency is around 600 ms on CPU. PyTorch doesn’t use MKL by default, that’s might be a reason why it’s slow on linux. PyTorch on CUDA should be a lot faster. If you get this number with djl-bench, most likely it’s what you can get with pytorch engine. You can try enable mkldnn, but I’m not sure if that works for your model:

        System.setProperty("ai.djl.pytorch.use_mkldnn", "true");
Read more comments on GitHub >

github_iconTop Results From Across the Web

Why Do I Get Different Results Each Time in Machine Learning?
The impact is that each time the stochastic machine learning algorithm is run on the same data, it learns a slightly different model....
Read more >
Deploying a Model - Christian Kästner - Medium
Using an inconsistent encoding between training and inference is dangerous and can lead to wrong predictions as the runtime data does not match...
Read more >
Saved model behaves differently on different machines #7676
If I run my code twice, I get the exact same results. The problem is that the results are reproducible only on the...
Read more >
Improving your Machine Learning Model Performance is ...
Model performance is an assessment of the model's ability to perform a task accurately not only with training data but also in real-time...
Read more >
Troubleshoot pipeline runs - Azure DevOps - Microsoft Learn
If you are experiencing intermittent or inconsistent MSBuild failures, try instructing MSBuild to use a single-process only. Intermittent or ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found