Unintuitive results working with nullable data (from parquet)
See original GitHub issueI’ve opened some trees with uproot4 and written the arrays in the parquet format using ak.to_parquet (from some quick tests that I ran, reading from parquet files seems much faster for doubly jagged arrays as compared to TTrees). When the parquet file is read back, from the docs I understand that the type is not exactly the same because the fields are nullable. This appears to lead to some unintuitive results (for the same data, arrays is loaded from the parquet file, whilearrays_original is read directly via uproot):
In [1]: arrays_original["data.fJetPt"].layout
Out[1]: <NumpyArray format="f" shape="25109" data="22.3846 21.4953 20.3251 26.6239 23.0994 ... 21.6158 21.8578 22.133 26.5842 20.8287" at="0x7fafb6180000"/>
In [2]: arrays_original["data.fJetPt"] > 0
Out[2]: <Array [True, True, True, ... True, True, True] type='25109 * bool'>
In [3]: arrays["data.fJetPt"].layout
Out[3]:
<BitMaskedArray valid_when="true" length="25109" lsb_order="true">
<mask><IndexU8 i="[255 255 255 255 255 ... 255 255 255 255 31]" offset="0" length="3139" at="0x0001114b0000"/></mask>
<content><NumpyArray format="f" shape="25109" data="22.3846 21.4953 20.3251 26.6239 23.0994 ... 21.6158 21.8578 22.133 26.5842 20.8287" at="0x0001140be600"/></content>
</BitMaskedArray>
In [4]: arrays["data.fJetPt"] > 0
Out[4]: <Array [None, None, None, ... None, None, None] type='25109 * ?bool'>
The output from [4] is not so intuitive from my perspective. My expectation was that it would evaluate to a mask, and if there were somehow missing values (which isn’t the case here), then either leave them as None, or return False (because the condition couldn’t be evaluated). I think I can sort of follow what’s happening: the array is wrapped in a BitMaskedArray (hence the nullable), so asking for bool > 0 is interpreted as a comparison that can’t be made, and it returns None. I also see that I can work with the arrays as normal by ak.fill_none(arrays["data.fJetPt"], 0), which then makes the types non-nullable.
I’m filling this as a bug report because it seems confusing, but it could alternatively be interpreted as a documentation request to add a note to the arrow conversion page that users will likely want to apply fill_none (not ideal if you actually mean to have None, but I’m not sure what to do in that case). Otherwise, it seems that many of the ak functions don’t work (for example, ak.to_numpy also doesn’t work), which can be rather confusing.
Thanks!
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (8 by maintainers)

Top Related StackOverflow Question
It’s merged. Enjoy!
(I find that if I don’t deal with these things right away, other things intervene and I end up never getting back to them. If it’s a bug, I don’t want it to go unfixed.)