API: support "unique=True" in MultiIndex.get_level_values()
See original GitHub issueCode Sample, a copy-pastable example if possible
I often find my self doing
In [2]: df = pd.Series(index=pd.MultiIndex.from_product([['A', 'B'], ['a', 'b']]))
In [3]: df.index.get_level_values(0).unique()
Out[3]: Index(['A', 'B'], dtype='object')
Problem description
The above is very inefficient, because first a Series
is built which includes a copy of the entire level (possibly using way more memory than the index itself), and only then duplicates are stripped. Other people on SO have faced the same problem, and this is also blocking a fix I wrote for #17845.
I’m pushing a simple PR in seconds.
Expected Output
Same as above, but in an efficient way.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.9.0-3-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: it_IT.UTF-8 LOCALE: it_IT.UTF-8
pandas: 0.21.0rc1+19.gb15d92d14 pytest: 3.0.6 pip: 9.0.1 setuptools: None Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 pyarrow: None xarray: None IPython: 5.1.0.dev sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.3.0 numexpr: 2.6.1 feather: 0.3.1 matplotlib: 2.0.0 openpyxl: None xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.6 lxml: None bs4: 4.5.3 html5lib: 0.999999999 sqlalchemy: 1.0.15 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: 0.2.1
Issue Analytics
- State:
- Created 6 years ago
- Comments:7 (7 by maintainers)
Top GitHub Comments
I agree it would be nice to have a clean way to get those unique values, but IMO it does not belong in
get_level_values
. That method returns the actual values of the Index level, with a length equal to the length of the Index, and IMO we should stick to that contract. Having such a keyword would completely alter the return type of this method.(not directly a good idea for alternative though)
.get_level_values(level, used=False)
, though I am not sure I like this either.