Rename counts (and counts64) to "sizes" and get the axis right
See original GitHub issue(It’s important that it’s plural: sizes!)
This is a follow-on from https://github.com/scikit-hep/awkward-1.0/pull/115#issuecomment-586623849.
Given an array like
[[[ 2, 3, 5, 7, 11],
[ 13, 17, 19, 23, 29],
[ 31, 37, 41, 43, 47]],
[[ 53, 59, 61, 67, 71],
[ 73, 79, 83, 89, 97],
[101, 103, 107, 109, 113]]]
array.sizes(axis=0)should return[3, 3](as a NumpyArray)array.sizes(axis=1)should return[[5, 5, 5], [5, 5, 5]](as a RegularArray/ListArray/ListOffsetArray of NumpyArray)array.sizes(axis=2)should be illegal.
I guess negative axis should count from the deepest possible, so axis=-1 → axis=1 and axis=-2 → axis=0. Whatever this does, flatten should do the same, so this issue might supersede #51.
Let’s consider some examples with branching, to figure out what they should do. An array like
[[{"x": [], "y": [[]]},
{"x": [1], "y": [[], [1]]},
{"x": [2, 2], "y": [[], [1], [2, 2]]}],
[],
[{"x": [3, 3, 3], "y": [[], [1], [2, 2], [3, 3, 3]]},
{"x": [4, 4, 4, 4], "y": [[], [1], [2, 2], [3, 3, 3], [4, 4, 4, 4]]}]]
has depth 3 along "x" and depth 4 along "y".
array.sizes(axis=0)should return[3, 0, 2].array.sizes(axis=1)should return[[{"x": 0, "y": 1}, {"x": 1, "y": 2}, {"x": 2, "y": 3}], [], [{"x": 3, "y": 4}, {"x": 4, "y": 5}]]array.sizes(axis=2)should be illegal because it fails for"x". An exception occurs during the recursive descent; we don’t have to check for it before descending.
Now for the negative axis values:
array.sizes(axis=-1)should return[[{"x": 0, "y": [0]}, {"x": 1, "y": [0, 1]}, {"x": 2, "y": [0, 1, 2]}], [], [{"x": 3, "y": [0, 1, 2, 3]}, {"x": 4, "y": [0, 1, 2, 3, 4]}]]because it represents the deepestaxisalong each branch.array.sizes(axis=-2)should be illegal because it wants to be a non-record along"x"and a record along"y".
So, negative axis values aren’t just synonyms for non-negative axis values, as they are for rectilinear data. The reducer operations (PR #115) have a similar meaning, though reducers can be applied one level deeper than sizes and flatten can, so it’s a little different.
I’m not sure whether @ianna or I will do this. We might need to figure it out together.
@nsmith-: you might have opinions about the usefulness of negative axis having a different meaning than positive axis minus depth.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (4 by maintainers)

Top Related StackOverflow Question
Some changes in definition, implemented in #152: (1) I shifted the meaning of
axisby one, so that it’s always describing lengths of the specified axis, rather than one deeper. So in my original example,axis=0now returns2, the scalar length of the array;axis=1now returns[3, 3](what the oldaxis=0would have returned)axis=2now returns[[5, 5, 5], [5, 5, 5]](what the oldaxis=1would have returned).axisvalues that correspond to an axis in the array (this one has only three:0,1, and2) are illegal, althoughaxis=0is qualitatively different from the rest. (That’s a recurring theme in these operations.)Major change number (2) is that I’ve changed the name again, from “sizes” to “num”. This will allow physicists to write very readable code, like
and
There is no collision in the NumPy namespace with a function named “num”, and it’s far from any of NumPy’s existing concepts (unlike “size”).
Oh, I remember that. It was implemented as described above, then I extended it to an extra depth to return a one dimensional NumpyArray as in
counts(axis)fromstudy/flatten.pyEither
lengthsorsizesis less confusing thencounts.Intuitively, I’d say
lengthis constant whilesizeisn’t, but it’s from Java/C++ world.