[API RFC] Support `IColumn.if_else` for Boolean Column
See original GitHub issuecond.if_else(a, b) -> IColumn
, the output is defined as:
out_i = a_i
ifcond_i
is Trueout_i = b_i
otherwise
API Demo
>>> import torcharrow as ta
>>> cond = ta.Column([True, False, True, True])
>>> a = ta.Column([1, 2, 3, 4])
>>> b = ta.Column([10, 20, 30, 40])
>>> cond.if_else(a, b)
0 1
1 20
2 3
3 4
dtype: int64, length: 4, null_count: 0
Current API
It’s currently called ite
in Boolean Column: https://github.com/facebookresearch/torcharrow/blob/0dbe14d399b766a2dfd596e35cb7843e7514ad59/torcharrow/icolumn.py#L848-L853
API in other frameworks
PyArrow
pyarrow.compute
has if_else. Note instead of being a member function to BooleanArray
, it’s a standalone function in pyarrow.compute
package that takes 3 arguments (cond
, left
, right
)
Pandas
Doesn’t seem to have same method?
NumPy/PyTorch/TensorFlow
NumPy provides np.where
: https://numpy.org/doc/stable/reference/generated/numpy.where.html
PyTorch provides torch.where
: https://pytorch.org/docs/stable/generated/torch.where.html
TensorFlow provides tf.where
: https://www.tensorflow.org/api_docs/python/tf/where
R
ifelse
as a standalone function: https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/ifelse
But in general, R seems to prefer standalone functions instead of fluent-style API.
Discussions
Shall we make ta.if_else
as standalone function that takes 3 arguments in torcharrow
package (rather than a member methods in BooleanColumn`)? – This is similar to Arrow Compute, PyTorch and TensorFlow.
- Standalone function (
if_else(cond, x, y)
):
ta.if_else(a < 1, a, 1)
ta.if_else(
tf.contains([1, 2, 3], a),
a, b
)
cond.if_else(x, y)
(a < 1).if_else(a, 1)
tf.contains([1, 2, 3], a).if_else(a, b)
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (7 by maintainers)
Top GitHub Comments
I see. Yeah looks like readability here is more or less a taste thing :p I do agree that we should change it to the 3-arguments form for following the convention, but I can share what I see when reading
(a < 1).if_else(a, 1)
andta.if_else
The parenthesis actually separates the condition and makes it stand out more clearly when I read this expression. (isn’t parenthesis used widely for condition in many programming languages?)
The way the 2-arguments form separating condition from the candidate values also helps making the condition stand out clearly in my eyes (or brain) when the condition expression is long.
How the
if_else
symbol breaks condition expression and candidate expressions into 2 parts:<condition>.if_else(<then-value>, <else-value>)
e.g.
Resolved by https://github.com/facebookresearch/torcharrow/pull/56