question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

error_bad_lines is ignored if names argument is used in read_csv function

See original GitHub issue

Code Sample, a copy-pastable example if possible

#example taken from #20573
import io
import numpy as np
import pandas as pd
buf = io.StringIO("0,1,Amigo,3\n1,1,Inimigo,amigo,9\n2,1,Cowboy,42\n")
names = ['ID','X1','X2','X3']
dtypes = {"X3": int}
pd.read_csv(buf, names=names, error_bad_lines=False, dtype=dtypes, header=None)

Problem description

Bad lines option (error_bad_lines=False) is ignored when using the names argument. When omitting the names option everything works fine with pandas 0.23.3 (see issue #20573), but when names is used a ValueError is raised (ValueError: invalid literal for int() with base 10: ‘amigo’).

Expected Output

b’Skipping line 3: expected 4 fields, saw 5\n’

ID X1 X2 X3
0 1 Amigo 3
2 1 Cowboy 42

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line] INSTALLED VERSIONS

commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-24-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.23.3 pytest: 3.4.2 pip: 9.0.2 setuptools: 38.5.2 Cython: 0.27.3 numpy: 1.13.3 scipy: 1.0.0 pyarrow: 0.8.0 xarray: None IPython: 6.2.1 sphinx: 1.7.1 patsy: 0.5.0 dateutil: 2.6.1 pytz: 2018.3 blosc: None bottleneck: 1.2.1 tables: 3.4.2 numexpr: 2.6.4 feather: None matplotlib: 2.2.0 openpyxl: 2.5.1 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.2 lxml: 4.1.1 bs4: 4.6.0 html5lib: 0.9999999 sqlalchemy: 1.2.5 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: 0.1.4 pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
louis-redcommented, Aug 2, 2018

Will try to have a look at this one !

0reactions
WillAydcommented, Aug 15, 2018

Sorry I thought you were addressing the issue with the names argument as outlined by OP. If that’s not the case then yes open a separate issue

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas dataframe read_csv on bad data - python
If the function returns None , the bad line will be ignored. As you can see engine='python' is required. The great thing about...
Read more >
pandas.read_csv — pandas 1.5.2 documentation
Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if...
Read more >
Python pandas read CSV into DataFrame
DataFrame.info() function is used to get the metadata of the DataFrame. ... we can use the parameter name of DataFrame.read_csv() .
Read more >
15 ways to read CSV file with pandas
This tutorial explains how to read a CSV file using read_csv function of pandas package in Python. Here we are also covering how...
Read more >
How to read CSV File into Python using Pandas
Pandas read_csv() function imports a CSV file to DataFrame format. ... header: this allows you to specify which row will be used as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found