Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Rename counts (and counts64) to "sizes" and get the axis right

See original GitHub issue

(It’s important that it’s plural: sizes!)

This is a follow-on from https://github.com/scikit-hep/awkward-1.0/pull/115#issuecomment-586623849.

Given an array like

[[[  2,   3,   5,   7,  11],
  [ 13,  17,  19,  23,  29],
  [ 31,  37,  41,  43,  47]],
 [[ 53,  59,  61,  67,  71],
  [ 73,  79,  83,  89,  97],
  [101, 103, 107, 109, 113]]]

array.sizes(axis=0) should return [3, 3] (as a NumpyArray)
array.sizes(axis=1) should return [[5, 5, 5], [5, 5, 5]] (as a RegularArray/ListArray/ListOffsetArray of NumpyArray)
array.sizes(axis=2) should be illegal.

I guess negative axis should count from the deepest possible, so axis=-1 → axis=1 and axis=-2 → axis=0. Whatever this does, flatten should do the same, so this issue might supersede #51.

Let’s consider some examples with branching, to figure out what they should do. An array like

[[{"x": [], "y": [[]]},
  {"x": [1], "y": [[], [1]]},
  {"x": [2, 2], "y": [[], [1], [2, 2]]}],
 [],
 [{"x": [3, 3, 3], "y": [[], [1], [2, 2], [3, 3, 3]]},
  {"x": [4, 4, 4, 4], "y": [[], [1], [2, 2], [3, 3, 3], [4, 4, 4, 4]]}]]

has depth 3 along "x" and depth 4 along "y".

array.sizes(axis=0) should return [3, 0, 2].
array.sizes(axis=1) should return [[{"x": 0, "y": 1}, {"x": 1, "y": 2}, {"x": 2, "y": 3}], [], [{"x": 3, "y": 4}, {"x": 4, "y": 5}]]
array.sizes(axis=2) should be illegal because it fails for "x". An exception occurs during the recursive descent; we don’t have to check for it before descending.

Now for the negative axis values:

array.sizes(axis=-1) should return [[{"x": 0, "y": [0]}, {"x": 1, "y": [0, 1]}, {"x": 2, "y": [0, 1, 2]}], [], [{"x": 3, "y": [0, 1, 2, 3]}, {"x": 4, "y": [0, 1, 2, 3, 4]}]] because it represents the deepest axis along each branch.
array.sizes(axis=-2) should be illegal because it wants to be a non-record along "x" and a record along "y".

So, negative axis values aren’t just synonyms for non-negative axis values, as they are for rectilinear data. The reducer operations (PR #115) have a similar meaning, though reducers can be applied one level deeper than sizes and flatten can, so it’s a little different.

I’m not sure whether @ianna or I will do this. We might need to figure it out together.

@nsmith-: you might have opinions about the usefulness of negative axis having a different meaning than positive axis minus depth.

Issue Analytics

State:
Created 4 years ago
Comments:5 (4 by maintainers)

Top GitHub Comments

1reaction

jpivarskicommented, Mar 7, 2020

Some changes in definition, implemented in #152: (1) I shifted the meaning of axis by one, so that it’s always describing lengths of the specified axis, rather than one deeper. So in my original example,

[[[  2,   3,   5,   7,  11],
  [ 13,  17,  19,  23,  29],
  [ 31,  37,  41,  43,  47]],
 [[ 53,  59,  61,  67,  71],
  [ 73,  79,  83,  89,  97],
  [101, 103, 107, 109, 113]]]

axis=0 now returns 2, the scalar length of the array;
axis=1 now returns [3, 3] (what the old axis=0 would have returned)
axis=2 now returns [[5, 5, 5], [5, 5, 5]] (what the old axis=1 would have returned).
no axis values that correspond to an axis in the array (this one has only three: 0, 1, and 2) are illegal, although axis=0 is qualitatively different from the rest. (That’s a recurring theme in these operations.)

Major change number (2) is that I’ve changed the name again, from “sizes” to “num”. This will allow physicists to write very readable code, like

ak.num(muons)     # implicit axis=1

and

muons[ak.num(muons) >= 2]

There is no collision in the NumPy namespace with a function named “num”, and it’s far from any of NumPy’s existing concepts (unlike “size”).

0reactions

iannacommented, Feb 17, 2020

Oh, I remember that. It was implemented as described above, then I extended it to an extra depth to return a one dimensional NumpyArray as in counts(axis) from study/flatten.py

Either lengths or sizes is less confusing then counts.

Intuitively, I’d say length is constant while size isn’t, but it’s from Java/C++ world.