Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

In Boosting Assembler wrapping each estimator into a subroutine causes a performance degradation

See original GitHub issue

I’ve recalled the real motivation behind not wrapping every individual estimator into its own subroutine - generation of many nested function calls leads to a performance degradation in Java. The observed difference reaches 4x for larger models (eg. XGBoost with 1000 estimators). The basic test I created (sorry about Scala):

@ import com.github.m2cgen.ModelOld
import com.github.m2cgen.ModelOld

@ import com.github.m2cgen.ModelNew
import com.github.m2cgen.ModelNew

@ def nextRandomData(): Array[Double] = (0 until 4).map(_ => Random.nextDouble).toArray
defined function nextRandomData

@ def testScore: Unit = {
    val start = System.currentTimeMillis()
    (0 until 100000).foreach(_ => <ModelNew|ModelOld>.score(nextRandomData))
    println("Runtime: " + (System.currentTimeMillis() - start).toString)
  }

Results for ModelOld:

@ testScore
Runtime: 2973

For ModelNew:

@ testScore
Runtime: 10747

The test model has been trained using the sklearn.datasets.load_iris() dataset. Classifier has been created as following:

model = XGBClassifier(n_estimators=1000)

In the attached archive I included the following:

ModelNew.java - java code generated with the most recent master.
ModelOld.java - java code generated with the release 0.5.0 version.
Models.jar - the jar containing both compiled sources.
xgboost_model2 - the trained estimator in Pickle format.

CC: @StrikerRUS FYI

Issue Analytics

State:
Created 4 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

StrikerRUScommented, Mar 16, 2020

Can we close this?

0reactions

izeigermancommented, Mar 16, 2020

Yes, absolutely. Thank you!

Top Results From Across the Web

Ensemble methods: bagging, boosting and stacking

Boosting, like bagging, can be used for regression as well as for classification problems. Being mainly focused at reducing bias, the base ...

The Art of Assembly Language - IC-Unicamp

1.4 Arithmetic Operations on Binary and Hexadecimal Numbers . ... 4.9.1 The UCR Standard Library for 80x86 Assembly Language Programmers ............. 169.

Improving Pumping System Performance

Piping Configurations to Improve Pumping System Efficiency 29. Basic Pump Maintenance ... can cause a substantial loss in productivity.

Quality Loss Function - an overview | ScienceDirect Topics

This definition imposes: 1) that the quality loss is additive, and 2) that the function that enables us to calculate it is identical...

Common PCB Problems & Circuit Board Issues

This contamination can cause PCB components to burn and create connection problems. ... cable assembly and metallic packaging to absorb EMC and reduce...