QST: Inconsistent behaviour in checking number of fields per row while read_csv()
See original GitHub issueI post this as a “Question” because I am quite new to pandas
. So maybe I miss some understandings and the “problem” described by me is by design and you have good reasons for that.
I use pandas 1.2.4, with Python 3.9.4 on Windows 10 64 bit.
As a user I would expect that pandas check the number of fields per row when importing via csv file. But IMHO it does not in all cases.
Example 1
Here is a csv file without header and but a set names=
attribute with three fields. So pandas should be able to know how many fields/columns should be in the CSV file. The second row contains 4 instead of 3 fields.
import pandas
import io
csv_without_header = io.StringIO(
'A;B;C\n'
'D;E;X;Y\n'
'F;G;H'
)
df = pandas.read_csv(csv_without_header, encoding='utf-8', sep=';',
header=None,
names=['First', 'Second', 'Third'])
Pandas import this without warnrings or errors. The 4th field in the 2nd row is simply ignored.
Example 2
I added a header line into the csv file with again three fields. So pandas should be able to know how many fields/columns should be in the CSV file. And again the second row contains 4 instead of 3 fields.
csv_with_header = io.StringIO(
'First;Second;Third\n'
'A;B;C\n'
'D;E;X;Y\n'
'F;G;H'
)
df = pandas.read_csv(csv_with_header, encoding='utf-8', sep=';')
Here an error occurs as I expect.
pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4
Example 3
There are less then 3 fields in the 2nd row. Again here is no warning or error. The missing field is set with NaN
. And here it does not matter if you give the number of (expected) fields via header line in the CSV or via names=
attribute.
csv_with_header = io.StringIO(
'First;Second;Third\n'
'A;B;C\n'
'D;Y\n'
'F;G;H'
)
csv_without_header = io.StringIO(
'A;B;C\n'
'D;Y\n'
'F;G;H'
)
df_a = pandas.read_csv(csv_with_header, encoding='utf-8', sep=';')
df_b = pandas.read_csv(csv_without_header, encoding='utf-8', sep=';', names=['First', 'Second', 'Third'])
Want I want is to import CSV files and be informed if there are to many or less then the expected number of fields in any row.
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
This is a duplicate, please search the issue tracker,
Given the sheer volume and quality of (voluntary!) work produced (see https://github.com/pandas-dev/pandas/commits?author=phofl for a start, and that’s excluding reviews + issue triage), is demanding that they search the issue tracker for you really the best use of their resources?
Such comments are unwelcome - you’ve been warned