Inconsistent Model deploy & run performance
See original GitHub issueDescription
Hi! I’m deploying a YOLOv5 object detector model in Java via DJL. My code for deploying is very similar to the code provided by this guide: https://docs.djl.ai/jupyter/load_pytorch_model.html.
I’m deploying the same model via the same code across two different machines. The machines are spec-ed as follows:
- Macbook Pro 15" 2019: i9 9980HK, 32GB RAM, AMD 560 Graphics Card. Has integrated Intel graphics.
- Fedora PC: i5 12600K (30% faster than the i9), 16GB RAM, no graphics card. No integrated graphics either. Has integrated Intel graphics.
The time to run the model is 25ms on the Macbook Pro and 800ms on the i5. This is a 30x difference.
If I understand correctly, the AMD graphics card does not have CUDA, so it’s not used by DJL. I also confirmed that my GPU usage doesn’t spike in usage when I run my script. So both machines are using CPU. I don’t understand how the faster i5 processor can be 30x slower. Is it possible that the lack of the Intel integrated graphics is slowing down the i5?
Thank you!
Expected Behavior
Model run time should be largely consistent
Error Message
n/a
How to Reproduce?
this.pipeline.add(new Resize(OBJECT_DETECTION_FRAME_DIMENSION));
this.pipeline.add(new ToTensor());
this.translator = YoloV5Translator.builder().setPipeline(this.pipeline).optSynset(this.LABELS).build();
this.criteria = Criteria.builder().setTypes(Image.class, DetectedObjects.class)
.optModelPath(Paths.get(MODEL_DIRECTORY)).optProgress(new ProgressBar()).optTranslator(this.translator)
.build();
this.model = criteria.loadModel();
this.predictor = this.model.newPredictor();
Steps to reproduce
n/a
What have you tried to solve it?
n/a
Environment Info
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)
@davpapp For the image processing performance, you can use DJL opencv extension. The JPEG decoding with opencv is much faster than Java ImageIO. see: https://github.com/deepjavalibrary/djl/tree/master/extensions/opencv
You might want to run a bit more iterations:
But it looks like your model’s latency is around 600 ms on CPU. PyTorch doesn’t use MKL by default, that’s might be a reason why it’s slow on linux. PyTorch on CUDA should be a lot faster. If you get this number with djl-bench, most likely it’s what you can get with pytorch engine. You can try enable mkldnn, but I’m not sure if that works for your model: