Thread scalability is suboptimal
See original GitHub issueAs reported in https://github.com/ogrisel/pygbm/issues/30#issuecomment-436184680 , the scalability of pygbm is not as good as LightGBM.
Here are some results on a machine with the following CPUs:
Intel® Xeon® CPU E5-2650 v4 @ 2.20GHz: 2 sockets each with 12 cores each which means 48 hyperthreads in total.
1 thread (sequential)
NUMBA_NUM_THREADS=1 OMP_NUM_THREADS=1 python benchmarks/bench_higgs_boson.py --n-trees 100 --learning-rate 0.1 --n-leaf-nodes 255
| Model | Time | AUC | Speed up |
|---|---|---|---|
| LightGBM | 1045s | 0.8282 | 1x |
| pygbm | 1129s | 0.8192 | 1x |
8 threads
NUMBA_NUM_THREADS=8 OMP_NUM_THREADS=8 python benchmarks/bench_higgs_boson.py --n-trees 100 --learning-rate 0.1 --n-leaf-nodes 255
| Model | Time | AUC | Speed up |
|---|---|---|---|
| LightGBM | 160s | 0.8282 | 6.53x |
| pygbm | 356s | 0.8193 | 3.2x |
48 (hyper)threads
python benchmarks/bench_higgs_boson.py --n-trees 100 --learning-rate 0.1 --n-leaf-nodes 255
| Model | Time | AUC | Speed up |
|---|---|---|---|
| LightGBM | 91s | 0.8282 | 11.5x |
| pygbm | 130s | 0.8193 | 8.7x |
All of those pygbm runs used numba 0.40 from anaconda using the tbb backend (which is the fastest for pygbm).
Issue Analytics
- State:
- Created 5 years ago
- Comments:13 (8 by maintainers)
Top Results From Across the Web
Manually writing a multithreaded loop - suboptimal scalability
I am wondering why it is not scaling as well as I would like it to. With 5 threads it runs 8s versus...
Read more >Scaling to Thousands of Threads - TheTechSolo
In such model, the scheduler has the full control of the interleaving strategy and for this reason, the achieved throughput may be sub-optimal....
Read more >What type of scaling with the number of threads should one ...
Ideally, you'd expect a linear scaling with threading. But in practice, I've found that increasing the number of threads reduces the ratio of...
Read more >GATK3.8 Thread Scalability. a Scalability of BaseRecalibrator
... of the thread count. Both BaseRecalibrator and HaplotypeCaller experience a 5-fold speedup compared to a single-threaded run when using 16 threads, but...
Read more >4. Eight Simple Rules for Designing Multithreaded Applications
The number of threads will increase exponentially (much like the wives, sacks, cats, and kittens coming from St. Ives). As the submatrixes get...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

@Laurae2 I’m not sure what the “numba profiler” is, please could you clarify?! Numba has a built in parallel diagnostics tool which tracks transforms made to it’s own IR of the Python source as it converts serial code to parallel code, but that’s a compile-time diagnostic tool not a performance profiler.
Further, Numba 0.41.0 JIT profiling works with Intel Vtune, set the
NUMBA_ENABLE_PROFILINGenvironment variable to non-zero and that will register the LLVM JIT Event listener for Intel VTune.@ogrisel Note that LightGBM number of threads scale with the number of columns. Higgs dataset does not have enough columns for 48 threads (it will underestimate the scalability which gives you a lower scaling target).