Unpredictable behaviour with lists in columns
See original GitHub issueWhat was the underlying reasoning behind the current way lists in columns are handled? For example:
columnOf(1, null, listOf(1,2,3))
// untitled
// 0 [1]
// 1 [ ]
// 2 [1, 2, 3]
This turns the column into a Value Column of type List<Int>
and wraps everything into a list (without null
since that becomes an empty list).
and at the same time:
columnOf(1, null, listOf(1,2,3), mapOf(1 to 2))
// untitled
// 0 1
// 1 null
// 2 [1, 2, 3]
// 3 {1=2}
Which becomes a Value Column of type Any?
.
To me, the first behaviour shouldn’t happen. We cannot change the input data of the user so much as to erase nulls, modify list depth, and change data depending on other data in the input. It also causes me a headache trying to generate types from OpenApi and catch these cases (because just arrays of objects become a Frame column, but if there is a primitive array too then suddenly everything becomes a primitive list and the column becomes a value column, unless there’s another collection in there… you see? XD)
So, unless there is an important reason this behaviour is present I opt to remove it, since for me it acts too unpredictable.
Issue Analytics
- State:
- Created a year ago
- Comments:6 (1 by maintainers)
That’s a bug that is probably rooted in unification of column type inference with
pivot + groupBy
logic.If dataframe has duplicate pairs of values in key columns
a
andb
,pivot{ a }.groupBy{ b }
may produce columns with mixed scalar and list values that are now silently converted into lists in order to provide usable column type instead ofAny
.But this shouldn’t be done in init operations, such as
columnOf
,dataFrameOf
orread
for sure. So, you are absolutely right, this behaviour should be fixed.okay I think I fixed it 😃, if
createColumn
gets a suggested type ofList<Something>
it will convert the values if necessary, butguessValueType
won’t suggest it anymore (unless you specifylistifyValues = true
). Pivot gives a suggested type so that still works.