question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

In Boosting Assembler wrapping each estimator into a subroutine causes a performance degradation

See original GitHub issue

I’ve recalled the real motivation behind not wrapping every individual estimator into its own subroutine - generation of many nested function calls leads to a performance degradation in Java. The observed difference reaches 4x for larger models (eg. XGBoost with 1000 estimators). The basic test I created (sorry about Scala):

@ import com.github.m2cgen.ModelOld
import com.github.m2cgen.ModelOld

@ import com.github.m2cgen.ModelNew
import com.github.m2cgen.ModelNew

@ def nextRandomData(): Array[Double] = (0 until 4).map(_ => Random.nextDouble).toArray
defined function nextRandomData

@ def testScore: Unit = {
    val start = System.currentTimeMillis()
    (0 until 100000).foreach(_ => <ModelNew|ModelOld>.score(nextRandomData))
    println("Runtime: " + (System.currentTimeMillis() - start).toString)
  }

Results for ModelOld:

@ testScore
Runtime: 2973

For ModelNew:

@ testScore
Runtime: 10747

The test model has been trained using the sklearn.datasets.load_iris() dataset. Classifier has been created as following:

model = XGBClassifier(n_estimators=1000)

In the attached archive I included the following:

  1. ModelNew.java - java code generated with the most recent master.
  2. ModelOld.java - java code generated with the release 0.5.0 version.
  3. Models.jar - the jar containing both compiled sources.
  4. xgboost_model2 - the trained estimator in Pickle format.

CC: @StrikerRUS FYI

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
StrikerRUScommented, Mar 16, 2020

Can we close this?

0reactions
izeigermancommented, Mar 16, 2020

Yes, absolutely. Thank you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Ensemble methods: bagging, boosting and stacking
Boosting, like bagging, can be used for regression as well as for classification problems. Being mainly focused at reducing bias, the base ...
Read more >
The Art of Assembly Language - IC-Unicamp
1.4 Arithmetic Operations on Binary and Hexadecimal Numbers . ... 4.9.1 The UCR Standard Library for 80x86 Assembly Language Programmers ............. 169.
Read more >
Improving Pumping System Performance
Piping Configurations to Improve Pumping System Efficiency 29. Basic Pump Maintenance ... can cause a substantial loss in productivity.
Read more >
Quality Loss Function - an overview | ScienceDirect Topics
This definition imposes: 1) that the quality loss is additive, and 2) that the function that enables us to calculate it is identical...
Read more >
Common PCB Problems & Circuit Board Issues
This contamination can cause PCB components to burn and create connection problems. ... cable assembly and metallic packaging to absorb EMC and reduce...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found