.ne fails with abiguous message if comparing a list of columns containing column name 'dtype'
See original GitHub issueCode Sample, a copy-pastable example if possible
df = pd.DataFrame([[0,1,2,'aa'],[0,1,2,'aa'],[0,1,5,'bb'],[0,1,5,'bb'],[0,1,5,'bb'],['cc',4,4,4]], columns=['a','b','c','dtype'])
df.loc[:, ['a', 'dtype']].ne(df.loc[:, ['a', 'dtype']])
In [10]: df
Out[10]:
a b c dtype
0 0 1 2 aa
1 0 1 2 aa
2 0 1 5 bb
3 0 1 5 bb
4 0 1 5 bb
5 cc 4 4 4
Problem description
Instead of the expected output, I receive:
In [8]: df.loc[:, ['a','dtype']].ne(df.loc[:, ['a', 'dtype']])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-a0710f18822f> in <module>()
----> 1 df.loc[:, ['a','dtype']].ne(df.loc[:, ['a', 'dtype']])
/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/ops.pyc in f(self, other, axis, level)
1588 self, other = self.align(other, 'outer',
1589 level=level, copy=False)
-> 1590 return self._compare_frame(other, na_op, str_rep)
1591
1592 elif isinstance(other, ABCSeries):
/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _compare_frame(self, other, func, str_rep)
4790 return {col: func(a[col], b[col]) for col in a.columns}
4791
-> 4792 new_data = expressions.evaluate(_compare, str_rep, self, other)
4793 return self._constructor(data=new_data, index=self.index,
4794 columns=self.columns, copy=False)
/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/computation/expressions.pyc in evaluate(op, op_str, a, b, use_numexpr, **eval_kwargs)
201 """
202
--> 203 use_numexpr = use_numexpr and _bool_arith_check(op_str, a, b)
204 if use_numexpr:
205 return _evaluate(op, op_str, a, b, **eval_kwargs)
/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/computation/expressions.pyc in _bool_arith_check(op_str, a, b, not_allowed, unsupported)
173 unsupported = {'+': '|', '*': '&', '-': '^'}
174
--> 175 if _has_bool_dtype(a) and _has_bool_dtype(b):
176 if op_str in unsupported:
177 warnings.warn("evaluating in Python space because the {op!r} "
/home/blistein/env/aan/local/lib/python2.7/site-packages/pandas/core/generic.pyc in __nonzero__(self)
1574 raise ValueError("The truth value of a {0} is ambiguous. "
1575 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1576 .format(self.__class__.__name__))
1577
1578 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Expected Output
a c
0 False False
1 False False
2 False False
3 False False
4 False False
5 False False
Alternative: a descriptive Error Message, telling me I can’t use ‘dtype’ as column name.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-29-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.utf8 LANG: en_US.UTF-8 LOCALE: None.None
pandas: 0.23.4 pytest: 2.8.7 pip: 18.0 setuptools: 40.0.0 Cython: None numpy: 1.11.0 scipy: 0.17.0 pyarrow: None xarray: None IPython: 5.5.0 sphinx: None patsy: 0.4.1 dateutil: 2.7.3 pytz: 2014.10 blosc: None bottleneck: None tables: 3.2.2 numexpr: 2.4.3 feather: None matplotlib: 1.5.1 openpyxl: 2.3.0 xlrd: 0.9.4 xlwt: 0.7.5 xlsxwriter: None lxml: 3.5.0 bs4: None html5lib: 1.0.1 sqlalchemy: None pymysql: 0.7.2.None psycopg2: 2.6.1 (dt dec mx pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (8 by maintainers)
Top GitHub Comments
Maybe
pandas/tests/test_expressions.py
@TomAugspurger hi, i fix _has_bool_dtype function. but i don’t know which test directory is right directory. this is my first try to contribute open source…could you tell me which directory is more appropriate? thank you.