Ensure that functions's docstrings pass numpydoc validation
See original GitHub issueBackground / Objective
Docstrings in Python are string literals that occur as the first statement in a module, function, class, or method definition.
These are some of the characteristics of a docstring:
- Triple quotes are used to encompass the docstring text.
- There is no blank line before or after the docstring.
- The docstring is a phrase ending in a period.
- more details
numpydoc is one set of criteria to check for consistent documentation structure.
Validating docstrings in scikit-learn
To ensure consistent documentation structure in scikit-learn, we are using numpydoc validation. Currently, documentation tests are failing for various functions. As a temporary fix, we have suppressed error messages in test_docstrings.py. Many of the functions in scikit-learn need to be updated to comply with numpy docstring validation. In the below issue, we provide step-by-step instructions on how contributors can test and update functions.
Note
For those who are running into “YD01: No Yields section found”, it could be the cv parameter. Update An iterable yielding (train, test) splits as arrays of indices
to:
- An iterable that generates (train, test) splits as arrays of indices.
Steps
- Make sure you have the development dependencies and documentation dependencies installed.
- Pick an function from the list below and leave a comment saying you are going to work on it. This way we can keep track of what everyone is working on.
2.1 Make sure you’ve created a separate branch from
main
before editing files for your new contribution. Refer to our contributing guidelines for more information. - Remove the function from the list at: https://github.com/scikit-learn/scikit-learn/blob/670133dbc42ebd9f79552984316bc2fcfd208e2e/sklearn/tests/test_docstrings.py#L14
- Let’s say you picked
sklearn._config.config_context
, run numpydoc validation as follows.
pytest sklearn/tests/test_docstrings.py -k sklearn._config.config_context
- If you see failing test, please fix them by following the recommendation provided by the failing test.
- If you see all the tests past, you do not need to do any additional changes.
- Commit your changes.
- Open a Pull Request with an opening message
Addresses #21350
. Note that each item should be submitted in a separate Pull Request. - Include the function name in the title of the pull request. For example: “DOC Ensures that config_context passes numpydoc validation”.
Note: once you have issued 3 such PRs, feel free to move on to contributing more complex pull requests that involve more thinking and leave those issue fixes to first time contributors for them to learn the github contribution workflow 😃
Functions to Update
- sklearn._config.config_context #21426
- sklearn._config.get_config #21656
- sklearn.base.clone #21557
- sklearn.cluster._affinity_propagation.affinity_propagation #21778
- sklearn.cluster._agglomerative.linkage_tree #21424
- sklearn.cluster._kmeans.k_means #21423
- sklearn.cluster._kmeans.kmeans_plusplus #22200
- sklearn.cluster._mean_shift.estimate_bandwidth #21940
- sklearn.cluster._mean_shift.get_bin_seeds #22018
- sklearn.cluster._mean_shift.mean_shift #22019
- sklearn.cluster._optics.cluster_optics_dbscan
- sklearn.cluster._optics.cluster_optics_xi #22202
- sklearn.cluster._optics.compute_optics_graph #22024 #22205
- sklearn.cluster._spectral.spectral_clustering #22025
- sklearn.compose._column_transformer.make_column_transformer #22183
- sklearn.covariance._empirical_covariance.empirical_covariance #21439
- sklearn.covariance._empirical_covariance.log_likelihood #21438
- sklearn.covariance._graph_lasso.graphical_lasso #22326
- sklearn.covariance._robust_covariance.fast_mcd #22331
- sklearn.covariance._shrunk_covariance.ledoit_wolf #22496 #22798 #22748
- sklearn.covariance._shrunk_covariance.ledoit_wolf_shrinkage #22798 #22748
- sklearn.covariance._shrunk_covariance.shrunk_covariance #22798 #22260
- sklearn.datasets._base.get_data_home #22259
- sklearn.datasets._base.load_boston #22247
- sklearn.datasets._base.load_breast_cancer #22346
- sklearn.datasets._base.load_diabetes #21526
- sklearn.datasets._base.load_digits #22392
- sklearn.datasets._base.load_files #21727
- sklearn.datasets._base.load_iris #21760
- sklearn.datasets._base.load_linnerud #22484
- sklearn.datasets._base.load_sample_image #22805
- sklearn.datasets._base.load_wine #22469
- sklearn.datasets._california_housing.fetch_california_housing #22882
- sklearn.datasets._covtype.fetch_covtype #22918
- sklearn.datasets._kddcup99.fetch_kddcup99 #23929
- sklearn.datasets._lfw.fetch_lfw_pairs #23655
- sklearn.datasets._lfw.fetch_lfw_people #24161
- sklearn.datasets._olivetti_faces.fetch_olivetti_faces #22480
- sklearn.datasets._openml.fetch_openml #22483
- sklearn.datasets._rcv1.fetch_rcv1 #22225
- sklearn.datasets._samples_generator.make_biclusters #22790
- sklearn.datasets._samples_generator.make_blobs #22342
- sklearn.datasets._samples_generator.make_checkerboard #22390
- sklearn.datasets._samples_generator.make_classification #22797
- sklearn.datasets._samples_generator.make_gaussian_quantiles #23996
- sklearn.datasets._samples_generator.make_hastie_10_2 #22333
- sklearn.datasets._samples_generator.make_multilabel_classification #22784 #22782
- sklearn.datasets._samples_generator.make_regression #22784
- sklearn.datasets._samples_generator.make_sparse_coded_signal #22817
- sklearn.datasets._samples_generator.make_sparse_spd_matrix #22332
- sklearn.datasets._samples_generator.make_spd_matrix #23974
- sklearn.datasets._species_distributions.fetch_species_distributions #24162
- sklearn.datasets._svmlight_format_io.dump_svmlight_file #23166
- sklearn.datasets._svmlight_format_io.load_svmlight_file #24163 #24164
- sklearn.datasets._svmlight_format_io.load_svmlight_files #24164
- sklearn.datasets._twenty_newsgroups.fetch_20newsgroups #22329
- sklearn.decomposition._dict_learning.dict_learning #24316 #24289 #22793
- sklearn.decomposition._dict_learning.dict_learning_online #24289
- sklearn.decomposition._dict_learning.sparse_encode #22793
- sklearn.decomposition._fastica.fastica #23094
- sklearn.decomposition._nmf.non_negative_factorization #24235
- sklearn.externals._packaging.version.parse #24447 #24567 #24461 #24320 #22817 #22793 #22332
- sklearn.feature_extraction.image.extract_patches_2d #23926
- sklearn.feature_extraction.image.grid_to_graph #23052
- sklearn.feature_extraction.image.img_to_graph #23398
- sklearn.feature_extraction.text.strip_accents_ascii #23250
- sklearn.feature_extraction.text.strip_accents_unicode #24232
- sklearn.feature_extraction.text.strip_tags #23248
- sklearn.feature_selection._univariate_selection.chi2 #23945 #23943 #23467
- sklearn.feature_selection._univariate_selection.f_oneway
- sklearn.feature_selection._univariate_selection.r_regression #22785
- sklearn.inspection._partial_dependence.partial_dependence #24325 #24174
- sklearn.inspection._plot.partial_dependence.plot_partial_dependence #24325
- sklearn.isotonic.isotonic_regression #22475
- sklearn.linear_model._least_angle.lars_path #24319 #22500
- sklearn.linear_model._least_angle.lars_path_gram #24319
- sklearn.linear_model._omp.orthogonal_mp #24329 #22501
- sklearn.linear_model._omp.orthogonal_mp_gram #24329
- sklearn.linear_model._ridge.ridge_regression #22788
- sklearn.manifold._locally_linear.locally_linear_embedding #24330
- sklearn.manifold._t_sne.trustworthiness #24333
- sklearn.metrics._classification.accuracy_score #24259 #21478 #21441
- sklearn.metrics._classification.balanced_accuracy_score #21478
- sklearn.metrics._classification.brier_score_loss #23914
- sklearn.metrics._classification.classification_report #22803
- sklearn.metrics._classification.cohen_kappa_score #23915
- sklearn.metrics._classification.confusion_matrix #22842 #21496
- sklearn.metrics._classification.f1_score #22358
- sklearn.metrics._classification.fbeta_score #23486
- sklearn.metrics._classification.hamming_loss #21449
- sklearn.metrics._classification.hinge_loss #23387
- sklearn.metrics._classification.jaccard_score #23910
- sklearn.metrics._classification.log_loss #23657
- sklearn.metrics._classification.precision_recall_fscore_support #22472
- sklearn.metrics._classification.precision_score #23504 #22712 #21479
- sklearn.metrics._classification.recall_score #21495
- sklearn.metrics._classification.zero_one_loss #21450
- sklearn.metrics._plot.confusion_matrix.plot_confusion_matrix #22842
- sklearn.metrics._plot.det_curve.plot_det_curve #24334
- sklearn.metrics._plot.precision_recall_curve.plot_precision_recall_curve #24403
- sklearn.metrics._plot.roc_curve.plot_roc_curve #21547
- sklearn.metrics._ranking.auc #23505 #23433
- sklearn.metrics._ranking.average_precision_score #23504 #22712
- sklearn.metrics._ranking.coverage_error #24322
- sklearn.metrics._ranking.dcg_score #24351 #22400
- sklearn.metrics._ranking.label_ranking_average_precision_score #23504
- sklearn.metrics._ranking.label_ranking_loss #22781
- sklearn.metrics._ranking.ndcg_score #22400
- sklearn.metrics._ranking.precision_recall_curve #24403 #22514
- sklearn.metrics._ranking.roc_auc_score #23505
- sklearn.metrics._ranking.roc_curve #24351 #21547
- sklearn.metrics._ranking.top_k_accuracy_score #24259
- sklearn.metrics._regression.max_error #21420
- sklearn.metrics._regression.mean_absolute_error #21714
- sklearn.metrics._regression.mean_pinball_loss #24336
- sklearn.metrics._scorer.make_scorer #22367
- sklearn.metrics.cluster._bicluster.consensus_score #24343
- sklearn.metrics.cluster._supervised.adjusted_mutual_info_score #24344
- sklearn.metrics.cluster._supervised.adjusted_rand_score #24345
- sklearn.metrics.cluster._supervised.completeness_score #23016
- sklearn.metrics.cluster._supervised.entropy #24352
- sklearn.metrics.cluster._supervised.fowlkes_mallows_score #24352
- sklearn.metrics.cluster._supervised.homogeneity_completeness_v_measure #23942
- sklearn.metrics.cluster._supervised.homogeneity_score #23006
- sklearn.metrics.cluster._supervised.mutual_info_score #24344 #24093 #24091
- sklearn.metrics.cluster._supervised.normalized_mutual_info_score #24093
- sklearn.metrics.cluster._supervised.pair_confusion_matrix #24094
- sklearn.metrics.cluster._supervised.rand_score #24345 #24096
- sklearn.metrics.cluster._supervised.v_measure_score #24097
- sklearn.metrics.cluster._unsupervised.davies_bouldin_score #21850
- sklearn.metrics.cluster._unsupervised.silhouette_samples #21851
- sklearn.metrics.cluster._unsupervised.silhouette_score #21852
- sklearn.metrics.pairwise.additive_chi2_kernel #23943
- sklearn.metrics.pairwise.check_paired_arrays #23944
- sklearn.metrics.pairwise.check_pairwise_arrays #23519
- sklearn.metrics.pairwise.chi2_kernel #23945 #23943
- sklearn.metrics.pairwise.cosine_distances #23946 #22141
- sklearn.metrics.pairwise.cosine_similarity #23947
- sklearn.metrics.pairwise.distance_metrics #23949
- sklearn.metrics.pairwise.euclidean_distances #22783 #22140 #21429
- sklearn.metrics.pairwise.haversine_distances #23044
- sklearn.metrics.pairwise.kernel_metrics #23950
- sklearn.metrics.pairwise.laplacian_kernel #23005
- sklearn.metrics.pairwise.linear_kernel #21470
- sklearn.metrics.pairwise.manhattan_distances #23900 #22139
- sklearn.metrics.pairwise.nan_euclidean_distances #22140
- sklearn.metrics.pairwise.paired_cosine_distances #22141
- sklearn.metrics.pairwise.paired_distances #22380
- sklearn.metrics.pairwise.paired_euclidean_distances #22783
- sklearn.metrics.pairwise.paired_manhattan_distances #23900
- sklearn.metrics.pairwise.pairwise_distances_argmin #23951 #23952
- sklearn.metrics.pairwise.pairwise_distances_argmin_min #23952
- sklearn.metrics.pairwise.pairwise_distances_chunked #24527
- sklearn.metrics.pairwise.pairwise_kernels
- sklearn.metrics.pairwise.polynomial_kernel #23953
- sklearn.metrics.pairwise.rbf_kernel #23954
- sklearn.metrics.pairwise.sigmoid_kernel #23955
- sklearn.model_selection._split.check_cv #22778
- sklearn.model_selection._split.train_test_split #21435
- sklearn.model_selection._validation.cross_val_predict #21433
- sklearn.model_selection._validation.cross_val_score #21464
- sklearn.model_selection._validation.cross_validate #23145
- sklearn.model_selection._validation.learning_curve #23911
- sklearn.model_selection._validation.permutation_test_score #23912
- sklearn.model_selection._validation.validation_curve #23913
- sklearn.neighbors._graph.kneighbors_graph #22459
- sklearn.neighbors._graph.radius_neighbors_graph #22462
- sklearn.pipeline.make_union #23909
- sklearn.preprocessing._data.binarize #24002 #22801
- sklearn.preprocessing._data.maxabs_scale #24359
- sklearn.preprocessing._data.normalize #24093 #23188 #22795
- sklearn.preprocessing._data.power_transform #22802
- sklearn.preprocessing._data.quantile_transform #22780
- sklearn.preprocessing._data.robust_scale #23908
- sklearn.preprocessing._data.scale #24362 #24359 #23908
- sklearn.preprocessing._label.label_binarize #24002
- sklearn.random_projection.johnson_lindenstrauss_min_dim #24003
- sklearn.svm._bounds.l1_min_c #24134
- sklearn.tree._export.plot_tree
- sklearn.utils.axis0_safe_slice #24561
- sklearn.utils.check_pandas_support #21566
- sklearn.utils.extmath.cartesian #21513
- sklearn.utils.extmath.density #24516
- sklearn.utils.extmath.fast_logdet #24605
- sklearn.utils.extmath.randomized_range_finder #22069
- sklearn.utils.extmath.randomized_svd #24607
- sklearn.utils.extmath.safe_sparse_dot #24567
- sklearn.utils.extmath.squared_norm #24360
- sklearn.utils.extmath.stable_cumsum #24348
- sklearn.utils.extmath.svd_flip #24581
- sklearn.utils.extmath.weighted_mode #24571
- sklearn.utils.fixes.delayed
- sklearn.utils.fixes.linspace #24582
- sklearn.utils.fixes.threadpool_info
- sklearn.utils.fixes.threadpool_limits
- sklearn.utils.gen_batches #24609
- sklearn.utils.gen_even_slices #24608
- sklearn.utils.get_chunk_n_rows #22539
- sklearn.utils.graph.graph_shortest_path
- sklearn.utils.graph.single_source_shortest_path_length #24474
- sklearn.utils.is_scalar_nan #24562
- sklearn.utils.metaestimators.available_if #24586
- sklearn.utils.metaestimators.if_delegate_has_method #24633
- sklearn.utils.multiclass.check_classification_targets #22793
- sklearn.utils.multiclass.class_distribution #24452
- sklearn.utils.multiclass.type_of_target #24463
- sklearn.utils.multiclass.unique_labels #24476
- sklearn.utils.resample #23916
- sklearn.utils.safe_mask #24425
- sklearn.utils.safe_sqr #24437
- sklearn.utils.shuffle #24367
- sklearn.utils.sparsefuncs.count_nonzero #24447
- sklearn.utils.sparsefuncs.csc_median_axis_0 #24461
- sklearn.utils.sparsefuncs.incr_mean_variance_axis #24477
- sklearn.utils.sparsefuncs.inplace_swap_column #23476
- sklearn.utils.sparsefuncs.inplace_swap_row #24518 #24513 #24178
- sklearn.utils.sparsefuncs.inplace_swap_row_csc #24513
- sklearn.utils.sparsefuncs.inplace_swap_row_csr #24518
- sklearn.utils.sparsefuncs.mean_variance_axis #24477 #24177
- sklearn.utils.sparsefuncs.min_max_axis #22839
- sklearn.utils.tosequence #22494
- sklearn.utils.validation.as_float_array #21502
- sklearn.utils.validation.assert_all_finite #22470
- sklearn.utils.validation.check_is_fitted #24454
- sklearn.utils.validation.check_memory #23039
- sklearn.utils.validation.check_random_state #23320 #22787
- sklearn.utils.validation.column_or_1d #21591
- sklearn.utils.validation.has_fit_parameter #21590
- sklearn.utils.validation.indexable #21431
Issue Analytics
- State:
- Created 2 years ago
- Comments:214 (206 by maintainers)
Top GitHub Comments
All functions now pass numpydoc validation. Thanks to everyone who contributed to this long standing issue ! We can close this issue and consider the numpydoc arc over 😃
Nice. Thank you to everyone that contributed.