question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unintuitive slicing behaviour when slicing with Arrays

See original GitHub issue

Not sure if bug or feature.

Slicing by an array of indices would be very handy but currently fails (or is unreliable).

Minimal working example taken (mostly based on the README.md):

import awkward1 as ak # 0.2.27
array = ak.Array([
    [{"x": 1.1, "y": [1]}],
    [{"x": 2.2, "y": [11, 12]}],
    [{"x": 3.3, "y": [21, 22, 23]}],
    #[], # cannot slice this by index
    [{"x": 3.3, "y": [31, 32, 33]}],
    [{"x": 4.4, "y": [41, 42, 43, 44]}],
    [{"x": 5.5, "y": [51, 52, 53, 54, 55]}]
])
# slicing should work by python objects or numpy
# but singleton seems to produce more reliable results
# strangely singletons sometimes do not convert 1-D numpy
# idx = np.array([0, 0, 1, 1, 2, 2])#[:, np.newaxis]
startIndices = ak.singletons([[0], [0], [1], [1], [2], [2]])

# slice each `y` in `array` from start to end resp. [0], [0:1], [1:2], [1], [2:], [2:3]
# endIndices = ak.singletons([[0], [1], [2], [1], [None], [3]])

assert array.shape[0] == startIndices.shape[0]

# this works
array['y', ... , 1:]
# while this fails with ValueError: in ListArray64 attempting to get 1, index out of range
# but should return the same?
array['y', ... , 1]
# (as a consequence) this also fails
array['y', ... , startIndices]

Maybe I am missing something here. Eventually would be nice to achieve a slice from startIndices to endIndicecs without creating boolean arrays of the entire length or a numba for loop.

mask = np.array([[True], [True, True], [False, True, True], [False, True, False], [False, False, True, True], [False, False, True, False]])
array['y', mask]

Fails with ValueError: arrays used as an index must be a (native-endian) integer or boolean

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
jpivarskicommented, Aug 5, 2020
starts = np.asarray(original.content.starts)

Throws an error with the same array: AttributeError: 'awkward1._ext.NumpyArray' object has no attribute 'starts'

That’s where you’re getting into trickiness-squared. The different node types in a layout have different properties: NumpyArray represents rectilinear data, like NumPy, which has no need of starts and stops. ListArray is a fully general jaggedness implementation and ListOffsetArray is the common case in which starts = offsets[:-1] and stops = offsets[1:] for some monotonically increasing offsets array with length N+1 (N is the length of the logical array). These links provide more information about the layout classes, but keep in mind that everything under layout is semi-internal. (It doesn’t start with an underscore because it’s public API, but it’s for framework developers, not data analysts.)

1reaction
jpivarskicommented, Aug 5, 2020

Indexing can be legitimately confusing (also true of NumPy). I’ll break this down to what I think you’re asking.

First, the array of records can always be projected onto "y". I tried a number of combinations and didn’t see any trouble with that. For simplicity of discussion here, instead of talking about

array = ak.Array([
    [{"x": 1.1, "y": [1]}],
    [{"x": 2.2, "y": [11, 12]}],
    [{"x": 3.3, "y": [21, 22, 23]}],
    #[], # cannot slice this by index   (if empty, you'll just have to pass an empty list in the slice)
    [{"x": 3.3, "y": [31, 32, 33]}],
    [{"x": 4.4, "y": [41, 42, 43, 44]}],
    [{"x": 5.5, "y": [51, 52, 53, 54, 55]}]
])

which has type

6 * var * {"x": float64, "y": var * int64}

we could talk about

array["y"]   # or array.y

which is

[[[1]], [[11, 12]], [[21, 22, 23]], [[31, 32, 33]], [[41, 42, 43, 44]], [[51, 52, 53, 54, 55]]]

with type

6 * var * var * int64

You can certainly do

>>> array["y", [[0], [0], [1], [1], [2], [2]]]
<Array [[[[1]]], [[[1, ... [[[21, 22, 23]]]] type='6 * 1 * var * var * int64'>

because each of the elements of the slice has length 1, just like array (and array.y) and the integer values in each is less than the length of each nested list:

>>> array.y[0, 0]
<Array [1] type='1 * int64'>                    # has an element 0
>>> array.y[1, 0]
<Array [11, 12] type='2 * int64'>               # has an element 0
>>> array.y[2, 0]
<Array [21, 22, 23] type='3 * int64'>           # has an element 1
>>> array.y[3, 0]
<Array [31, 32, 33] type='3 * int64'>           # has an element 1
>>> array.y[4, 0]
<Array [41, 42, 43, 44] type='4 * int64'>       # has an element 2
>>> array.y[5, 0]
<Array [51, 52, 53, 54, 55] type='5 * int64'>   # has an element 2
>>> array.y[6, 0]

and so that’s why it works. ak.singletons has nothing to do with it: it’s used to convert None values into empty lists and everything else into length-1 lists, which you already have.

Recent versions of NumPy provide a hint about why the mask didn’t work:

>>> mask = np.array([
...     [True],
...     [True, True],
...     [False, True, True],
...     [False, True, False],
...     [False, False, True, True],
...     [False, False, True, False]])

raises the warning

<stdin>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a
list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to
do this, you must specify 'dtype=object' when creating the ndarray

This is a NumPy array of dtype=object, which soon won’t be created automatically. Constructing the mask as an Awkward array is the first step:

>>> mask = ak.Array([
...     [True],
...     [True, True],
...     [False, True, True],
...     [False, True, False],
...     [False, False, True, True],
...     [False, False, True, False]])
>>> mask
<Array [[True], [True, ... False, True, False]] type='6 * var * bool'>

but it also needs the length-1 structure of startIndices to fit into the second axis:

>>> mask = ak.Array([
...     [[True]],
...     [[True, True]],
...     [[False, True, True]],
...     [[False, True, False]],
...     [[False, False, True, True]],
...     [[False, False, True, False]]])
>>> mask
<Array [[[True]], ... False, True, False]]] type='6 * var * var * bool'>
>>> array.y
<Array [[[1]], [[11, ... [51, 52, 53, 54, 55]]] type='6 * var * var * int64'>
>>> ak.num(mask, axis=2)
<Array [[1], [2], [3], [3], [4], [4]] type='6 * var * int64'>
>>> ak.num(array.y, axis=2)
<Array [[1], [2], [3], [3], [4], [5]] type='6 * var * int64'>

Okay; they line up: now we’re ready to go!

>>> array.y[mask]
<Array [[[1]], [[11, 12], ... 43, 44]], [[53]]] type='6 * var * var * int64'>

About making a slice option that can be different at each level (e.g. slice list 1 with 0:0, list 2 with 1:2, list 3 with 0:2), that’s an interesting idea, something that becomes useful in the context of ragged arrays that you wouldn’t have with rectilinear arrays.

Right now, that sort of thing can be done by opening up the ak.Array structure and manipulating its memory layout:

>>> original = array.y.layout
>>> original
<ListOffsetArray64>
    <offsets><Index64 i="[0 1 2 3 4 5 6]" offset="0" length="7" at="0x55f65db71150"/></offsets>
    <content><ListOffsetArray64>
        <offsets><Index64 i="[0 1 3 6 9 13 18]" offset="0" length="7" at="0x55f65db75170"/></offsets>
        <content><NumpyArray format="l" shape="18" data="1 11 12 21 22 ... 51 52 53 54 55" at="0x55f65d654e60"/></content>
    </ListOffsetArray64></content>
</ListOffsetArray64>
>>> starts = np.asarray(original.content.starts)
>>> stops  = np.asarray(original.content.stops)
>>> starts, stops
(array([ 0,  1,  3,  6,  9, 13], dtype=int64),
 array([ 1,  3,  6,  9, 13, 18], dtype=int64))

Slicing with a different start[i] and stop[i] at each i is a matter of adding and subtracting the right number from these starts and stops. Be careful if you modify these NumPy arrays in place: they are views of the Awkward layout and will change the Awkward array in-place (one of the few ways Awkward arrays are mutable).

>>> starts = starts + [0, 0, 1, 1, 2, 2]
>>> stops  = stops  - [0, 0, 1, 1, 2, 2]
>>> starts, stops
(array([ 0,  1,  4,  7, 11, 15], dtype=int64), array([ 1,  3,  5,  8, 11, 16], dtype=int64))
>>> modified = ak.layout.ListOffsetArray64(
...     original.offsets,
...     ak.layout.ListArray64(
...         ak.layout.Index64(starts),
...         ak.layout.Index64(stops),
...         original.content.content))
>>> modified
<ListOffsetArray64>
    <offsets><Index64 i="[0 1 2 3 4 5 6]" offset="0" length="7" at="0x55f65db71150"/></offsets>
    <content><ListArray64>
        <starts><Index64 i="[0 1 4 7 11 15]" offset="0" length="6" at="0x55f65db704d0"/></starts>
        <stops><Index64 i="[1 3 5 8 11 16]" offset="0" length="6" at="0x55f65db5b0b0"/></stops>
        <content><NumpyArray format="l" shape="18" data="1 11 12 21 22 ... 51 52 53 54 55" at="0x55f65d654e60"/></content>
    </ListArray64></content>
</ListOffsetArray64>
>>> ak.Array(modified)
<Array [[[1]], [[11, 12]], ... [[]], [[53]]] type='6 * var * var * int64'>
>>> ak.Array(modified).tolist()
[[[1]], [[11, 12]], [[22]], [[32]], [[]], [[53]]]

And that’s probably how a variable starts:stops would be implemented. But if the indexing is tricky, this is tricky-squared. It’s pretty easy to make an array that’s internally inconsistent (check with ak.is_valid and ak.validity_error).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Unintuitive slicing behaviour when slicing with Arrays #370
About making a slice option that can be different at each level (e.g. slice list 1 with 0:0, list 2 with 1:2, list...
Read more >
Seemingly inconsistent slicing behavior in numpy arrays
I ran across something that seemed to me like inconsistent behavior in Numpy slices. Specifically, please consider the following example: import ...
Read more >
Array Slicing - Problem Solving with Python
Therefore, the slicing operation [:2] pulls out the first and second values in an array. The slicing operation [1:] pull out the second...
Read more >
Understanding Arrays and Slices in Go - DigitalOcean
Once an array has allocated its size, the size can no longer be changed. A slice, on the other hand, is a variable...
Read more >
How to slice array of awk-output in bash? - Super User
Depending on what bash or awk version I have, I get different results when I try to slice an array that holds awk...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found