Unintuitive slicing behaviour when slicing with Arrays
See original GitHub issueNot sure if bug or feature.
Slicing by an array of indices would be very handy but currently fails (or is unreliable).
Minimal working example taken (mostly based on the README.md):
import awkward1 as ak # 0.2.27
array = ak.Array([
[{"x": 1.1, "y": [1]}],
[{"x": 2.2, "y": [11, 12]}],
[{"x": 3.3, "y": [21, 22, 23]}],
#[], # cannot slice this by index
[{"x": 3.3, "y": [31, 32, 33]}],
[{"x": 4.4, "y": [41, 42, 43, 44]}],
[{"x": 5.5, "y": [51, 52, 53, 54, 55]}]
])
# slicing should work by python objects or numpy
# but singleton seems to produce more reliable results
# strangely singletons sometimes do not convert 1-D numpy
# idx = np.array([0, 0, 1, 1, 2, 2])#[:, np.newaxis]
startIndices = ak.singletons([[0], [0], [1], [1], [2], [2]])
# slice each `y` in `array` from start to end resp. [0], [0:1], [1:2], [1], [2:], [2:3]
# endIndices = ak.singletons([[0], [1], [2], [1], [None], [3]])
assert array.shape[0] == startIndices.shape[0]
# this works
array['y', ... , 1:]
# while this fails with ValueError: in ListArray64 attempting to get 1, index out of range
# but should return the same?
array['y', ... , 1]
# (as a consequence) this also fails
array['y', ... , startIndices]
Maybe I am missing something here.
Eventually would be nice to achieve a slice from startIndices to endIndicecs without creating boolean arrays of the entire length or a numba for loop.
mask = np.array([[True], [True, True], [False, True, True], [False, True, False], [False, False, True, True], [False, False, True, False]])
array['y', mask]
Fails with
ValueError: arrays used as an index must be a (native-endian) integer or boolean
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Unintuitive slicing behaviour when slicing with Arrays #370
About making a slice option that can be different at each level (e.g. slice list 1 with 0:0, list 2 with 1:2, list...
Read more >Seemingly inconsistent slicing behavior in numpy arrays
I ran across something that seemed to me like inconsistent behavior in Numpy slices. Specifically, please consider the following example: import ...
Read more >Array Slicing - Problem Solving with Python
Therefore, the slicing operation [:2] pulls out the first and second values in an array. The slicing operation [1:] pull out the second...
Read more >Understanding Arrays and Slices in Go - DigitalOcean
Once an array has allocated its size, the size can no longer be changed. A slice, on the other hand, is a variable...
Read more >How to slice array of awk-output in bash? - Super User
Depending on what bash or awk version I have, I get different results when I try to slice an array that holds awk...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

That’s where you’re getting into trickiness-squared. The different node types in a layout have different properties: NumpyArray represents rectilinear data, like NumPy, which has no need of
startsandstops. ListArray is a fully general jaggedness implementation and ListOffsetArray is the common case in whichstarts = offsets[:-1]andstops = offsets[1:]for some monotonically increasingoffsetsarray with length N+1 (N is the length of the logical array). These links provide more information about the layout classes, but keep in mind that everything underlayoutis semi-internal. (It doesn’t start with an underscore because it’s public API, but it’s for framework developers, not data analysts.)Indexing can be legitimately confusing (also true of NumPy). I’ll break this down to what I think you’re asking.
First, the array of records can always be projected onto
"y". I tried a number of combinations and didn’t see any trouble with that. For simplicity of discussion here, instead of talking aboutwhich has type
we could talk about
which is
with type
You can certainly do
because each of the elements of the slice has length 1, just like
array(andarray.y) and the integer values in each is less than the length of each nested list:and so that’s why it works. ak.singletons has nothing to do with it: it’s used to convert
Nonevalues into empty lists and everything else into length-1 lists, which you already have.Recent versions of NumPy provide a hint about why the
maskdidn’t work:raises the warning
This is a NumPy array of
dtype=object, which soon won’t be created automatically. Constructing the mask as an Awkward array is the first step:but it also needs the length-1 structure of
startIndicesto fit into the second axis:Okay; they line up: now we’re ready to go!
About making a slice option that can be different at each level (e.g. slice list 1 with 0:0, list 2 with 1:2, list 3 with 0:2), that’s an interesting idea, something that becomes useful in the context of ragged arrays that you wouldn’t have with rectilinear arrays.
Right now, that sort of thing can be done by opening up the
ak.Arraystructure and manipulating its memorylayout:Slicing with a different
start[i]andstop[i]at eachiis a matter of adding and subtracting the right number from thesestartsandstops. Be careful if you modify these NumPy arrays in place: they are views of the Awkward layout and will change the Awkward array in-place (one of the few ways Awkward arrays are mutable).And that’s probably how a variable starts:stops would be implemented. But if the indexing is tricky, this is tricky-squared. It’s pretty easy to make an array that’s internally inconsistent (check with ak.is_valid and ak.validity_error).