question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`unique()` has wrong return type

See original GitHub issue

cc https://github.com/databricks/koalas/issues/233

Not sure why https://github.com/databricks/koalas/pull/249 decided to return a Series instead of a numpy array… ?

>>> import findspark
>>> findspark.init()
>>> import databricks.koalas as ks
>>> import pandas as pd

>>> findspark.__version__
'1.3.0'
>>> pd.__version__
'0.24.2'
>>> ks.__version__
'0.12.0'

>>> pdf = pd.DataFrame({'a': [1, 2]})
>>> kdf = ks.DataFrame(pdf)
... (spark output removed) ...
>>> kdf['a'].unique()
0    1                                                                          
1    2
Name: a, dtype: int64
>>> pdf['a'].unique()
array([1, 2])
>>> list(pdf['a'].unique())
[1, 2]
>>> list(kdf['a'].unique())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/databricks/koalas/series.py", line 2407, in __getitem__
    return Series(self._scol.__getitem__(key), anchor=self._kdf, index=self._index_map)
  File "/usr/local/lib/python3.6/dist-packages/databricks/koalas/series.py", line 273, in __init__
    data=data, index=index, dtype=dtype, name=name, copy=copy, fastpath=fastpath)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/series.py", line 198, in __init__
    elif isinstance(data, (ABCSeries, ABCSparseSeries)):
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/dtypes/generic.py", line 9, in _check
    return getattr(inst, attr, '_typ') in comp
  File "/opt/spark/python/pyspark/sql/column.py", line 682, in __nonzero__
    raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

>>> a = []
>>> a.extend(kdf['a'].unique())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/databricks/koalas/series.py", line 2407, in __getitem__
    return Series(self._scol.__getitem__(key), anchor=self._kdf, index=self._index_map)
  File "/usr/local/lib/python3.6/dist-packages/databricks/koalas/series.py", line 273, in __init__
    data=data, index=index, dtype=dtype, name=name, copy=copy, fastpath=fastpath)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/series.py", line 198, in __init__
    elif isinstance(data, (ABCSeries, ABCSparseSeries)):
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/dtypes/generic.py", line 9, in _check
    return getattr(inst, attr, '_typ') in comp
  File "/opt/spark/python/pyspark/sql/column.py", line 682, in __nonzero__
    raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

>>> import numpy as np
>>> np.__version__
'1.15.2'
>>> np.array(pdf['a'].unique())
array([1, 2])
>>> np.array(kdf['a'].unique())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/databricks/koalas/series.py", line 2407, in __getitem__
    return Series(self._scol.__getitem__(key), anchor=self._kdf, index=self._index_map)
  File "/usr/local/lib/python3.6/dist-packages/databricks/koalas/series.py", line 273, in __init__
    data=data, index=index, dtype=dtype, name=name, copy=copy, fastpath=fastpath)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/series.py", line 198, in __init__
    elif isinstance(data, (ABCSeries, ABCSparseSeries)):
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/dtypes/generic.py", line 9, in _check
    return getattr(inst, attr, '_typ') in comp
  File "/opt/spark/python/pyspark/sql/column.py", line 682, in __nonzero__
    raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
rxincommented, Aug 6, 2019

One addition: we should at the very least improve the error message, so people will find out about to_numpy without reading all the docs.

0reactions
rxincommented, Sep 27, 2019

Thanks for the comment @smalory.

@ueshin can we fix the error message?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Task has a wrong return type - Stack Overflow
No Task in your Button Click event. Just async . It's an event handler. Returns nothing. – Jimi. Aug 26, 2018 at 9:07....
Read more >
BUG: pandas.Series.unique() does not return correct ... - GitHub
unique() does not return correct unique values when working with non-standard strings. In above example data_list clearly has 5 unique elements ...
Read more >
SELECT() - AppSheet Help - Google Support
Returns a list of values from the column of selected rows in the data set. Sample usage. SELECT(Students[First Name], TRUE, FALSE) returns a...
Read more >
pandas.Series — pandas 1.5.2 documentation
Labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host...
Read more >
Google Visualization API Reference | Charts
Each column has a descriptor that includes its data type, a label for that column (which might be ... clone(), DataTable, Returns a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found