Standardize array type annotations in docstrings
See original GitHub issueHere are a few conventions floating around the repo with minor differences in how arrays are documented:
From https://github.com/pystatgen/sgkit/blob/master/sgkit/stats/association.py#L26:
Parameters
----------
XL : (M, N) array-like
"Loop" covariates for which N separate regressions will be run
XC : (M, P) array-like
"Core" covariates included in the regressions for each loop
covariate. All P core covariates are used in each of the N
loop covariate regressions.
Y : (M, O)
Continuous outcomes
From https://github.com/pystatgen/sgkit/blob/master/sgkit/stats/regenie.py#L610:
Parameters
----------
G : (M, N) ArrayLike
Genotype data array, `M` samples by `N` variants.
X : (M, C) ArrayLike
Covariate array, `M` samples by `C` covariates.
Y : (M, O) ArrayLike
Outcome array, `M` samples by `O` outcomes.
contigs : (N,) ArrayLike
From https://github.com/pystatgen/sgkit/blob/1a5885734020cc52847a3fdd327a35f4426aeb73/sgkit/api.py:
Parameters
----------
variant_contig_names : list of str
The contig names.
variant_contig : array_like, int
The (index of the) contig for each variant.
variant_position : array_like, int
The reference position of the variant.
There are some situations where a data type for the array and shape information is relevant and other times that it’s not. I propose we standardize array_name : [shape_info] array_like[, dtype]
. As an example, x : (M, N) array_like, int
.
When the “array_like” type needs to be more specific, e.g. a Dask array, I think we can still use “array_like” in the docstring and let a more specific type hint indicate the true bounds on inputs.
For reference, here is a random Dask docstring snippet:
Parameters
----------
a : (M, M) array_like
A triangular matrix
b : (M,) or (M, N) array_like
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
A few comments on some of these:
variant_contig_names
: the “list of str” part is redundant so could be dropped.variant_contig
: if the type wasArrayLike
, then we would only have to indicate that the element type isint
in the description. Perhaps this could just be in parentheses: “(element type:int
) The index of the contig for each variant.” Or it could go at the end of the description.variant_id
: if the type wasOptional[ArrayLike]
, then this would render better.I think it should be, I’ll have a look.
I think, we should be as explicit as we can in the type hints, like specifying exact set of expected types for params and return types. I see there is already an
ArrayLike
type object defined in here:https://github.com/pystatgen/sgkit/blob/cd764bfc1f715dd69df6161a5f8043095b1e6586/sgkit/typing.py#L7
We should use these instead of
Any
and create new types in their if required and shape or data type information can go in the description as @eric-czech suggested.