question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

API: support "unique=True" in MultiIndex.get_level_values()

See original GitHub issue

Code Sample, a copy-pastable example if possible

I often find my self doing

In [2]: df = pd.Series(index=pd.MultiIndex.from_product([['A', 'B'], ['a', 'b']]))

In [3]: df.index.get_level_values(0).unique()
Out[3]: Index(['A', 'B'], dtype='object')

Problem description

The above is very inefficient, because first a Series is built which includes a copy of the entire level (possibly using way more memory than the index itself), and only then duplicates are stripped. Other people on SO have faced the same problem, and this is also blocking a fix I wrote for #17845.

I’m pushing a simple PR in seconds.

Expected Output

Same as above, but in an efficient way.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.9.0-3-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: it_IT.UTF-8 LOCALE: it_IT.UTF-8

pandas: 0.21.0rc1+19.gb15d92d14 pytest: 3.0.6 pip: 9.0.1 setuptools: None Cython: 0.25.2 numpy: 1.12.1 scipy: 0.19.0 pyarrow: None xarray: None IPython: 5.1.0.dev sphinx: 1.5.6 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.3.0 numexpr: 2.6.1 feather: 0.3.1 matplotlib: 2.0.0 openpyxl: None xlrd: 1.0.0 xlwt: 1.1.2 xlsxwriter: 0.9.6 lxml: None bs4: 4.5.3 html5lib: 0.999999999 sqlalchemy: 1.0.15 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: 0.2.1

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
jorisvandenbosschecommented, Oct 16, 2017

I agree it would be nice to have a clean way to get those unique values, but IMO it does not belong in get_level_values. That method returns the actual values of the Index level, with a length equal to the length of the Index, and IMO we should stick to that contract. Having such a keyword would completely alter the return type of this method.

(not directly a good idea for alternative though)

0reactions
jrebackcommented, Oct 16, 2017

.get_level_values(level, used=False), though I am not sure I like this either.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.MultiIndex.get_level_values
Values is a level of this MultiIndex converted to a single Index (or subclass thereof). Notes. If the level contains missing values, the...
Read more >
getting unique index's value from multiindex - Stack Overflow
Use get_level_values first, then unique and last convert to list : L = df.index.get_level_values('date').unique().tolist() print (L[:10]) ...
Read more >
databricks.koalas.MultiIndex.unique - Read the Docs
Return unique values in the index. Be aware the order of unique values might be different than pandas.Index.unique. Parameters. levelint or str, optional, ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found