Generic benchmarking/profiling tool
See original GitHub issueWe have not been proficient at documenting the estimated runtime or space complexity of our estimators and algorithms. Even were we to document asymptotic complexity functions, it would not give a realistic estimate for all parameter settings, etc. for a particular kind of data. Rather we could assist users in estimating complexity functions empirically.
I would like to see a function something like the following:
def benchmark_estimator_cost(est, X, y=None, fit_params=None,
vary_n_samples=True, vary_n_features=False,
n_fits=100, time_budget=300, profile_memory=True):
"""Profiles the cost of fitting est on samples of different size
Parameters
----------
est : estimator
X : array-like
y : array-like, optional
fit_params : dict, optional
vary_n_samples : bool, default=True
Whether to benchmark for various random sample sizes.
vary_n_features : bool, default=False
Whether to benchmark for various random feature set sizes.
n_fits : int, default=100
Maximum number of fits to make while benchmarking.
time_budget : int, default=300
Maximum number of seconds to use overall. Current fit will
be stopped if the budget is exceeded.
profile_memory : bool, default=True
Whether to include memory (or just time) profiling. Memory
profiling will slow down fitting, and hence make fit_time
estimates more approximate.
Returns
-------
results : dict
The following keys are each mapped to an array:
n_samples
The number of samples
n_features
The number of samples
fit_time
In seconds
peak_memory
The memory used at peak of fitting, in KiB.
model_memory
The memory in use at the end of fitting, minus that at the
beginning, in KiB.
models : dict
keys 'peak_memory', 'model_memory' and 'fit_time' map to polynomial
GP regressors whose input is n_samples and n_features and whose
outputs are each of those targets.
errors : list of dicts
lists the parameters that resulted in exceptions
"""
This would run fit
successively for different values of n_samples
(logarithmically spaced, perhaps guided by a gaussian process) to estimate the function for fitting complexity, within budget. I have not thought extensively about exactly what sampling strategy would be followed. If this is implemented for the library, we would consider experimental and the algorithm subject to change for a little while.
What do others think?
Issue Analytics
- State:
- Created 6 years ago
- Reactions:9
- Comments:32 (32 by maintainers)
@jnothman #17026 Is an implementation of a benchmarking tool for the sample datasets we use in the sklearn examples, it doesn’t exactly cover the use case that was in mind for this profiling tool, which was intended to be used to model the change in performance of estimators as their hyperparams change.
I’m not sure how well #17026 solves the need of a user estimating how well an algorithm will scale on their specific data. If it does, a tutorial would be beneficial!