Pandas DataFrame groupby().Size() giving 'Value Error : Length of passed values is 65, index implies 0'
See original GitHub issueCode Sample, a copy-pastable example if possible
from os import path
import pandas as pd
import numpy as np
input_file = path.join(r'C:\DUMP', 'Process Log 2 Week_2.txt')
tdf = pd.read_csv(input_file, low_memory=False)
# Value Error in this statement -->
tdf_gsdf = tdvdf.groupby(tdvdf.columns.tolist()).size()
Problem description
The Above code is giving ‘Value Error : Length of passed values is 65, index implies 0’ I’m trying to identify unique/duplicate rows by grouping by all of the columns in Data Frame.
(Attached the text file here). Process Log 2 Week_2.txt
I’m new to Python, Pandas and this community as well. just trying to automate few tasks in my project. I think this might be related to Issue #21624. Not sure how to link.
Expected Output
Output should give distinct rows and corresponding count from DataFrame.
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None python: 3.6.6.final.0 python-bits: 64 OS: Windows OS-release: 7 machine: AMD64 processor: Intel64 Family 6 Model 63 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None
pandas: 0.23.4 pytest: 3.8.0 pip: 10.0.1 setuptools: 40.4.3 Cython: 0.28.5 numpy: 1.15.1 scipy: 1.1.0 pyarrow: None xarray: None IPython: 6.5.0 sphinx: 1.8.1 patsy: 0.5.0 dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: 1.2.1 tables: 3.4.4 numexpr: 2.6.8 feather: None matplotlib: 2.2.3 openpyxl: 2.5.8 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.1.1 lxml: 4.2.5 bs4: 4.6.3 html5lib: 1.0.1 sqlalchemy: 1.2.11 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (4 by maintainers)
Top GitHub Comments
The problem are the NA entries in your dataset. Each row in your dataset has at least one NA somewhere. When you apply .groupby to NA entries, it wouldn’t know how to group NAs so it removes them, leaving an empty result (length 0).
See http://pandas.pydata.org/pandas-docs/stable/missing_data.html#na-values-in-groupby and http://pandas.pydata.org/pandas-docs/stable/groupby.html#na-and-nat-group-handling
look here,this Error:help me