question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Meta-issue: accelerate the slowest running tests

See original GitHub issue

When running the current test suite on azure, pytest reports the following 20 slowest tests:

  • 15.40s call ensemble/tests/test_voting.py::test_gridsearch #21422
  • 11.31s call tests/test_common.py::test_estimators[SequentialFeatureSelector(estimator=LogisticRegression(C=1))-check_estimator_sparse_data] #21515
  • 8.96s call svm/tests/test_svm.py::test_svc_ovr_tie_breaking[NuSVC] #21443
  • 7.40s call utils/tests/test_estimator_checks.py::test_check_estimator_clones #21498
  • 6.49s call ensemble/tests/test_bagging.py::test_classification #21476
  • 5.52s call ensemble/tests/test_common.py::test_ensemble_heterogeneous_estimators_behavior[stacking-classifier] #21562
  • 5.19s call ensemble/tests/test_common.py::test_ensemble_heterogeneous_estimators_behavior[stacking-regressor] #21562
  • 4.41s call linear_model/tests/test_quantile.py::test_asymmetric_error[0.2] #21546
  • 4.12s call ensemble/tests/test_gradient_boosting.py::test_gradient_boosting_early_stopping #21903
  • 4.12s call linear_model/tests/test_quantile.py::test_asymmetric_error[0.8] #21546
  • 3.91s call ensemble/tests/test_bagging.py::test_oob_score_removed_on_warm_start #21892
  • 3.86s call tests/test_common.py::test_estimators[RFECV(estimator=LogisticRegression(C=1))-check_estimator_sparse_data] #21515
  • 3.80s call linear_model/tests/test_quantile.py::test_asymmetric_error[0.5] #21546
  • 3.80s call experimental/tests/test_enable_successive_halving.py::test_imports_strategies cannot easily be optimized
  • 3.36s call ensemble/tests/test_gradient_boosting.py::test_regression_dataset[0.5-huber] #21984
  • 3.27s call feature_selection/tests/test_sequential.py::test_nan_support #21823
  • 3.06s call model_selection/tests/test_split.py::test_nested_cv #21551
  • 3.02s call feature_selection/tests/test_sequential.py::test_unsupervised_model_fit[4] https://github.com/scikit-learn/scikit-learn/pull/22045
  • 3.01s call decomposition/tests/test_kernel_pca.py::test_kernel_pca_solvers_equivalence[20] #21746
  • 2.97s call ensemble/tests/test_bagging.py::test_parallel_classification #21896

On another machine I found the following slow tests:

  • 30.13s call sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_models_cv_fit_for_all_backends[MultiTaskElasticNetCV-threading] #21918
  • 21.48s call sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_models_cv_fit_for_all_backends[MultiTaskLassoCV-threading] #21918
  • 9.89s call sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_models_cv_fit_for_all_backends[MultiTaskElasticNetCV-loky] #21918
  • 8.05s call sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_models_cv_fit_for_all_backends[MultiTaskLassoCV-loky] #21918

This list can probably be updated once the above have been dealt with.

Ideally, each of those should last less than 1s and even preferably less than 10ms if possible: we have more than 20,000 tests in scikit-learn so we strive to make our tests run as fast as possible while testing the interesting things so as to detect as many potential regressions as possible. We need to exercise judgement to strike a good balance between fast test execution speed (which is beneficial for the contribution workflow and ease of maintenance) and exhaustive enough coverage of nominal code paths and interesting edge cases.

The goal of this issue is to track the progress of writing individual PRs for each test (typically only changing only one file at a time) to rewrite them with the following in mind:

  • read and understand the purpose of the original test, possibly referring to the scikit-learn documentation when necessary;
  • try to tweak the tests (smaller dataset, different hyperparameters, different number of iterations…) to make the test run faster while preserving the original purpose of the test;
  • if you think it’s not possible to improve the speed of a given slow test in this list after analysis, please explain why in a comment on this issue;
  • if acceleration is possible, open a PR with the updated test and link to this issue in the PR description by stating Towards #21407.

Before pushing commits in a PR, please run the tests locally with the following command-line (for instance for the first test of this list):

pytest -v --durations=20 -k test_gridsearch ensemble/tests/test_voting.py

for parametrized tests, with [] and () in their name, pytest will refuse to select them as is. Instead you can use several expressions to select a specific parametrized test. For instance for the second test:

pytest -v --durations=20 sklearn/tests/test_common.py -k "test_estimators and SequentialFeatureSelector and LogisticRegression and check_estimator_sparse_data"

If this is the first time you contribute to scikit-learn, please have a look at the contributor’s guide first (in particular to learn how to build the main dev branch of scikit-learn from source and how to run the tests locally.

Note: in your PR, please report the test duration measured on your local machine before and after your changes.

Note 2: try to aim for low hanging fruits: some tests cannot be significantly be accelerated without changing the core intentions of the test. Other can be accelerated by a factor of 100x while preserving the core intention of the test. Do not waste too much time trying to prune less than 50% of the original runtime and try another test instead.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:27 (27 by maintainers)

github_iconTop GitHub Comments

2reactions
HideakiImamuracommented, Dec 15, 2021

Thanks @norbusan. Then I think you should check the box above to avoid confusion for others. What do you think?

I’m going to work on another test acceleration. 3.36s call ensemble/tests/test_gradient_boosting.py::test_regression_dataset[0.5-huber]

1reaction
HideakiImamuracommented, Dec 15, 2021

Can I take this item?: 3.80s call experimental/tests/test_enable_successive_halving.py::test_imports_strategies

Read more comments on GitHub >

github_iconTop Results From Across the Web

9 Ways To Make Slow Tests Faster - Semaphore CI
How do we fix slow tests? And how do we accelerate a CI/CD pipeline? Here are the nine most common performance problems and...
Read more >
Canonicalization in VisbilePosition is slow in few layout tests
Issue 510337: Canonicalization in VisbilePosition is slow in few layout ... Following tests are timeout in 6 second in bots or 4 second...
Read more >
How to use Bazel with GitLab to speed up your builds
Using Bazel to run your tests is just as easy, and there are nice benefits to doing so. If you can rely on...
Read more >
Cripplingly slow UI: am I the only one? - JupyterLab
I'm not sure why having multiple idle notebooks open or kernels running should kneecap performance as badly as it does.
Read more >
Fast fixes for slow tests: How to unclog your CI pipeline
One of the best ways to speed up tests is to run them in parallel. Attempting to shave one second off this test...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found