question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Missing value support brought perf regression

See original GitHub issue

I’m working on a blog post for HB + TVM, and I found that the PR https://github.com/microsoft/hummingbird/pull/419/ brought big perf regression compared to the benchmark results I had two months ago.

For example, here is the GPU result on year dataset, before and after https://github.com/microsoft/hummingbird/pull/419/. I’m seeing similar regression on other dataset as well, and pytorch results are also worse.

Before

{'conversion_time': '26.918972336000024', 'prediction_time': '0.005081184775504806', 'peak_mem': '1863.08984375', 'is_same_output': True}
{
  "year": {
    "xgb": {
      "hb-tvm": {
        "conversion_time": "26.918972336000024",
        "is_same_output": true,
        "peak_mem": "1863.08984375",
        "prediction_time": "0.005081184775504806"
      },
      "peak_mem": 856.05078125,
      "prediction_time": 0.017587377367330452
    }
  }
}

After

{'conversion_time': '78.68978722800011', 'prediction_time': '0.006477392132670242', 'peak_mem': '1857.5703125', 'is_same_output': True}
{
  "year": {
    "xgb": {
      "hb-tvm": {
        "conversion_time": "78.68978722800011",
        "is_same_output": true,
        "peak_mem": "1857.5703125",
        "prediction_time": "0.006477392132670242"
      },
      "peak_mem": 861.5234375,
      "prediction_time": 0.017615050061188096
    }
  }
}

I don’t know what https://github.com/microsoft/hummingbird/pull/419/ did, but the graph is definitely bigger than before. Conversion time also increased by more than 2x, but that’s TVM mystery that I forget for now.

Since dataset doesn’t have any missing data, https://github.com/microsoft/hummingbird/pull/419 shouldn’t change anything, and this regression is not acceptable. Can we revert https://github.com/microsoft/hummingbird/pull/419 until regression is fixed? Also, can we setup benchmark infra on CI so that things like wouldn’t happen in the future?

@interesaaat @scnakandala

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:5

github_iconTop GitHub Comments

1reaction
masahicommented, Feb 3, 2021

@ksaur yes it fixed it, thanks for quick action! @interesaaat

1reaction
masahicommented, Feb 2, 2021

Yes, I bisected it and the Before number above is from https://github.com/microsoft/hummingbird/commit/94b584b87bd4ae4374f1c9828a418df201eca62f

Ideally we want a design where missing value support is factored into a single class, and turning it on or off can be done by swapping in or out that class.

Read more comments on GitHub >

github_iconTop Results From Across the Web

NSDUH Methods for Handling Missing Item Values ...
The purpose of this report is to guide analysts interested in fitting regression models using data from the National Survey on Drug Use...
Read more >
How to treat missing values in your data : Part II
Using a simple linear regression, we will impute 'Transactions' by including the imputed missing values for 'Gender' (imputed from Decision ...
Read more >
A review and analysis of the literature (2010–2021)
Missing value imputation affects the performance of machine learning: A review and analysis of the literature (2010–2021).
Read more >
Dealing with missing data: Key assumptions and methods ...
One of them, X, has missing values. We select those cases with complete information and regress X on all the other independent variables....
Read more >
Quantifying the Effect of Missing Values on Model Accuracy ...
Problem description. Observations with missing values cannot be used by many supervised machine learning techniques like regression methods ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found