Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`unique()` has wrong return type

See original GitHub issue

cc https://github.com/databricks/koalas/issues/233

Not sure why https://github.com/databricks/koalas/pull/249 decided to return a Series instead of a numpy array… ?

>>> import findspark
>>> findspark.init()
>>> import databricks.koalas as ks
>>> import pandas as pd

>>> findspark.__version__
'1.3.0'
>>> pd.__version__
'0.24.2'
>>> ks.__version__
'0.12.0'

>>> pdf = pd.DataFrame({'a': [1, 2]})
>>> kdf = ks.DataFrame(pdf)
... (spark output removed) ...
>>> kdf['a'].unique()
0    1                                                                          
1    2
Name: a, dtype: int64
>>> pdf['a'].unique()
array([1, 2])
>>> list(pdf['a'].unique())
[1, 2]
>>> list(kdf['a'].unique())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/databricks/koalas/series.py", line 2407, in __getitem__
    return Series(self._scol.__getitem__(key), anchor=self._kdf, index=self._index_map)
  File "/usr/local/lib/python3.6/dist-packages/databricks/koalas/series.py", line 273, in __init__
    data=data, index=index, dtype=dtype, name=name, copy=copy, fastpath=fastpath)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/series.py", line 198, in __init__
    elif isinstance(data, (ABCSeries, ABCSparseSeries)):
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/dtypes/generic.py", line 9, in _check
    return getattr(inst, attr, '_typ') in comp
  File "/opt/spark/python/pyspark/sql/column.py", line 682, in __nonzero__
    raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

>>> a = []
>>> a.extend(kdf['a'].unique())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/databricks/koalas/series.py", line 2407, in __getitem__
    return Series(self._scol.__getitem__(key), anchor=self._kdf, index=self._index_map)
  File "/usr/local/lib/python3.6/dist-packages/databricks/koalas/series.py", line 273, in __init__
    data=data, index=index, dtype=dtype, name=name, copy=copy, fastpath=fastpath)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/series.py", line 198, in __init__
    elif isinstance(data, (ABCSeries, ABCSparseSeries)):
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/dtypes/generic.py", line 9, in _check
    return getattr(inst, attr, '_typ') in comp
  File "/opt/spark/python/pyspark/sql/column.py", line 682, in __nonzero__
    raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

>>> import numpy as np
>>> np.__version__
'1.15.2'
>>> np.array(pdf['a'].unique())
array([1, 2])
>>> np.array(kdf['a'].unique())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/databricks/koalas/series.py", line 2407, in __getitem__
    return Series(self._scol.__getitem__(key), anchor=self._kdf, index=self._index_map)
  File "/usr/local/lib/python3.6/dist-packages/databricks/koalas/series.py", line 273, in __init__
    data=data, index=index, dtype=dtype, name=name, copy=copy, fastpath=fastpath)
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/series.py", line 198, in __init__
    elif isinstance(data, (ABCSeries, ABCSparseSeries)):
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/dtypes/generic.py", line 9, in _check
    return getattr(inst, attr, '_typ') in comp
  File "/opt/spark/python/pyspark/sql/column.py", line 682, in __nonzero__
    raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.