question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Crosstab Not Working with Duplicate Column Labels

See original GitHub issue

Code Sample, a copy-pastable example if possible

1. Example with 2 lambdas applied on same column

df = pd.DataFrame([[0, 1], [0, 1], [1, 0]], columns=[‘a’, ‘b’]) df a b 0 0 1 1 0 1 2 1 0

pd.crosstab(df[‘a’].apply(lambda x: x), df[‘a’].apply(lambda x: x+1)) a 1 2 a 1 2 0 2 0 1

2. Example by manually renaming a column

df = pd.DataFrame([[0, 1], [0, 1], [1, 0]], columns=[‘a’, ‘b’]) df a b 0 0 1 1 0 1 2 1 0

pd.crosstab(df[‘a’], df[‘b’]) b 0 1 a 0 0 2 1 1 0

s = df[‘b’] s.name = ‘a’ pd.crosstab(df[‘a’], s) a 0 1 a 0 1 0 1 0 2

In both cases, the output is not the one expected. The crosstab applies one Series to itself.

Problem description

I encountered the problem when using a crosstab on the same column with 2 different lambdas. The crosstab output a crosstab applying the second column to itself. It seems that crosstab is confused by the same name of the 2 series.

The problem is the same when the 2nd column is manually renamed with the name of the 1rst one, before applying the crosstab. See examples.

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None python: 3.6.4.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 69 Stepping 1, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.22.0 pytest: 3.3.2 pip: 10.0.0 setuptools: 38.4.0 Cython: 0.27.3 numpy: 1.14.0 scipy: 1.0.0 pyarrow: None xarray: None IPython: 6.2.1 sphinx: 1.6.6 patsy: 0.5.0 dateutil: 2.6.1 pytz: 2017.3 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.4 feather: None matplotlib: 2.1.2 openpyxl: 2.4.10 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: 1.0.1 sqlalchemy: 1.2.1 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
TomAugspurgercommented, Aug 28, 2018

To be clear, this has nothing to do with lambdas, right? In the following, Out[26] is the expected output, but with the name a substituted for b?

In [25]: pd.crosstab(pd.Series([0, 0, 1, 1], name='a'), pd.Series([0, 1, 0, 1], name='a'))
Out[25]:
a  0  1
a
0  2  0
1  0  2

In [26]: pd.crosstab(pd.Series([0, 0, 1, 1], name='a'), pd.Series([0, 1, 0, 1], name='b'))
Out[26]:
b  0  1
a
0  1  1
1  1  1
1reaction
cuchoicommented, Jun 7, 2019

I made a PR that raises an error when trying to use duplicated column names. Does it make sense to you?

Read more comments on GitHub >

github_iconTop Results From Across the Web

CrossTab does not support renaming of duplicate fields.
So I came up with this solution and now the error makes sense. Add a select tool between the basic data profile, and...
Read more >
Cross Tab Showing DUPLICATES
Hi, I have issue with Cross Tab in Web Intelligence 3,1 , i have ... the one in the pic attached , Both...
Read more >
Postgres Crosstab query - "duplicate category" error where ...
1 Answer 1 ... I'm guessing it has something to do with the fact that PostgreSQL truncates identifiers (including column names and category...
Read more >
Showing duplicate data on rows in cross tab report
Hi there, I'm wondering if there is any way to have the data in a cross tab report repeat on every row instead...
Read more >
Issue with crosstab
... i am getting duplicate records and column sequence is not proper. ... its showing duplicate data and order of the headers are...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found