Meta-issue: accelerate the slowest running tests
See original GitHub issueWhen running the current test suite on azure, pytest reports the following 20 slowest tests:
- 15.40s call
ensemble/tests/test_voting.py::test_gridsearch
#21422 - 11.31s call
tests/test_common.py::test_estimators[SequentialFeatureSelector(estimator=LogisticRegression(C=1))-check_estimator_sparse_data]
#21515 - 8.96s call
svm/tests/test_svm.py::test_svc_ovr_tie_breaking[NuSVC]
#21443 - 7.40s call
utils/tests/test_estimator_checks.py::test_check_estimator_clones
#21498 - 6.49s call
ensemble/tests/test_bagging.py::test_classification
#21476 - 5.52s call
ensemble/tests/test_common.py::test_ensemble_heterogeneous_estimators_behavior[stacking-classifier]
#21562 - 5.19s call
ensemble/tests/test_common.py::test_ensemble_heterogeneous_estimators_behavior[stacking-regressor]
#21562 - 4.41s call
linear_model/tests/test_quantile.py::test_asymmetric_error[0.2]
#21546 - 4.12s call
ensemble/tests/test_gradient_boosting.py::test_gradient_boosting_early_stopping
#21903 - 4.12s call
linear_model/tests/test_quantile.py::test_asymmetric_error[0.8]
#21546 - 3.91s call
ensemble/tests/test_bagging.py::test_oob_score_removed_on_warm_start
#21892 - 3.86s call
tests/test_common.py::test_estimators[RFECV(estimator=LogisticRegression(C=1))-check_estimator_sparse_data]
#21515 - 3.80s call
linear_model/tests/test_quantile.py::test_asymmetric_error[0.5]
#21546 - 3.80s call
experimental/tests/test_enable_successive_halving.py::test_imports_strategies
cannot easily be optimized - 3.36s call
ensemble/tests/test_gradient_boosting.py::test_regression_dataset[0.5-huber]
#21984 - 3.27s call
feature_selection/tests/test_sequential.py::test_nan_support
#21823 - 3.06s call
model_selection/tests/test_split.py::test_nested_cv
#21551 - 3.02s call
feature_selection/tests/test_sequential.py::test_unsupervised_model_fit[4]
https://github.com/scikit-learn/scikit-learn/pull/22045 - 3.01s call
decomposition/tests/test_kernel_pca.py::test_kernel_pca_solvers_equivalence[20]
#21746 - 2.97s call
ensemble/tests/test_bagging.py::test_parallel_classification
#21896
On another machine I found the following slow tests:
- 30.13s call
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_models_cv_fit_for_all_backends[MultiTaskElasticNetCV-threading]
#21918 - 21.48s call
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_models_cv_fit_for_all_backends[MultiTaskLassoCV-threading]
#21918 - 9.89s call
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_models_cv_fit_for_all_backends[MultiTaskElasticNetCV-loky]
#21918 - 8.05s call
sklearn/linear_model/tests/test_coordinate_descent.py::test_linear_models_cv_fit_for_all_backends[MultiTaskLassoCV-loky]
#21918
This list can probably be updated once the above have been dealt with.
Ideally, each of those should last less than 1s and even preferably less than 10ms if possible: we have more than 20,000 tests in scikit-learn so we strive to make our tests run as fast as possible while testing the interesting things so as to detect as many potential regressions as possible. We need to exercise judgement to strike a good balance between fast test execution speed (which is beneficial for the contribution workflow and ease of maintenance) and exhaustive enough coverage of nominal code paths and interesting edge cases.
The goal of this issue is to track the progress of writing individual PRs for each test (typically only changing only one file at a time) to rewrite them with the following in mind:
- read and understand the purpose of the original test, possibly referring to the scikit-learn documentation when necessary;
- try to tweak the tests (smaller dataset, different hyperparameters, different number of iterations…) to make the test run faster while preserving the original purpose of the test;
- if you think it’s not possible to improve the speed of a given slow test in this list after analysis, please explain why in a comment on this issue;
- if acceleration is possible, open a PR with the updated test and link to this issue in the PR description by stating
Towards #21407
.
Before pushing commits in a PR, please run the tests locally with the following command-line (for instance for the first test of this list):
pytest -v --durations=20 -k test_gridsearch ensemble/tests/test_voting.py
for parametrized tests, with []
and ()
in their name, pytest will refuse to select them as is. Instead you can use several expressions to select a specific parametrized test. For instance for the second test:
pytest -v --durations=20 sklearn/tests/test_common.py -k "test_estimators and SequentialFeatureSelector and LogisticRegression and check_estimator_sparse_data"
If this is the first time you contribute to scikit-learn, please have a look at the contributor’s guide first (in particular to learn how to build the main dev branch of scikit-learn from source and how to run the tests locally.
Note: in your PR, please report the test duration measured on your local machine before and after your changes.
Note 2: try to aim for low hanging fruits: some tests cannot be significantly be accelerated without changing the core intentions of the test. Other can be accelerated by a factor of 100x while preserving the core intention of the test. Do not waste too much time trying to prune less than 50% of the original runtime and try another test instead.
Issue Analytics
- State:
- Created 2 years ago
- Comments:27 (27 by maintainers)
Top GitHub Comments
Thanks @norbusan. Then I think you should check the box above to avoid confusion for others. What do you think?
I’m going to work on another test acceleration.
3.36s call ensemble/tests/test_gradient_boosting.py::test_regression_dataset[0.5-huber]
Can I take this item?:
3.80s call experimental/tests/test_enable_successive_halving.py::test_imports_strategies