question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

QST: Inconsistent behaviour in checking number of fields per row while read_csv()

See original GitHub issue

I post this as a “Question” because I am quite new to pandas. So maybe I miss some understandings and the “problem” described by me is by design and you have good reasons for that. I use pandas 1.2.4, with Python 3.9.4 on Windows 10 64 bit.

As a user I would expect that pandas check the number of fields per row when importing via csv file. But IMHO it does not in all cases.

Example 1

Here is a csv file without header and but a set names= attribute with three fields. So pandas should be able to know how many fields/columns should be in the CSV file. The second row contains 4 instead of 3 fields.

import pandas
import io

csv_without_header = io.StringIO(
'A;B;C\n'
'D;E;X;Y\n'
'F;G;H'
)

df = pandas.read_csv(csv_without_header, encoding='utf-8', sep=';',
                     header=None,
                     names=['First', 'Second', 'Third'])

Pandas import this without warnrings or errors. The 4th field in the 2nd row is simply ignored.

Example 2

I added a header line into the csv file with again three fields. So pandas should be able to know how many fields/columns should be in the CSV file. And again the second row contains 4 instead of 3 fields.

csv_with_header = io.StringIO(
'First;Second;Third\n'
'A;B;C\n'
'D;E;X;Y\n'
'F;G;H'
)

df = pandas.read_csv(csv_with_header, encoding='utf-8', sep=';')

Here an error occurs as I expect. pandas.errors.ParserError: Error tokenizing data. C error: Expected 3 fields in line 3, saw 4

Example 3

There are less then 3 fields in the 2nd row. Again here is no warning or error. The missing field is set with NaN. And here it does not matter if you give the number of (expected) fields via header line in the CSV or via names= attribute.

csv_with_header = io.StringIO(
'First;Second;Third\n'
'A;B;C\n'
'D;Y\n'
'F;G;H'
)

csv_without_header = io.StringIO(
'A;B;C\n'
'D;Y\n'
'F;G;H'
)

df_a = pandas.read_csv(csv_with_header, encoding='utf-8', sep=';')
df_b = pandas.read_csv(csv_without_header, encoding='utf-8', sep=';', names=['First', 'Second', 'Third'])

Want I want is to import CSV files and be informed if there are to many or less then the expected number of fields in any row.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
phoflcommented, Jun 1, 2021

This is a duplicate, please search the issue tracker,

0reactions
MarcoGorellicommented, Jun 3, 2021

Anything else is waste of ressources.

Given the sheer volume and quality of (voluntary!) work produced (see https://github.com/pandas-dev/pandas/commits?author=phofl for a start, and that’s excluding reviews + issue triage), is demanding that they search the issue tracker for you really the best use of their resources?

you should know how to use GitHubs Issue tracker

Such comments are unwelcome - you’ve been warned

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Import csv with inconsistent count of columns per row ...
One approach would be to first read just the header row in and then pass ... 10)] tempfile = pd.read_csv(filename, index_col=None, sep=',', ...
Read more >
IO tools (text, CSV, HDF5, …) — pandas 1.5.2 documentation
If the number of fields in the column header row is equal to the number of fields in the body of the data...
Read more >
10.2 CSV & Text files — Pandas Doc - GitHub Pages
Row number (s) to use as the column names, and the start of the data. Default behavior is as if header=0 if no...
Read more >
How To Resample and Interpolate Your Time Series Data ...
The Pandas library provides a function called resample() on the Series and DataFrame objects. This can be used to group records when ......
Read more >
Read a delimited file (including CSV and TSV) into a tibble
read_csv() and read_tsv() are special cases of the more general read_delim() . ... columns, and the first row of the input will be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found