read_csv with filehandler and nrows argument
See original GitHub issueCode Sample, a copy-pastable example if possible
%%file example.csv
1,2
3,4
5,6
7,8
9,10
11,12
import pandas as pd
with open('example.csv') as f:
data = pd.read_csv(f, names=['A', 'B'], nrows=2)
print(f.readline())
with open('example.csv') as f:
data = pd.read_csv(f, names=['A', 'B'], nrows=1, engine='python')
print(f.readline())
print(f.readline())
print(data)
7,8
9,10
A B
0 1 2
with open('example.csv') as f:
data = pd.read_csv(f, names=['A', 'B'], nrows=2, engine='python')
print(f.readline())
print(f.readline())
print(data)
7,8
9,10
A B
0 1 2
1 3 4
with open('example.csv') as f:
data = pd.read_csv(f, names=['A', 'B'], nrows=3, engine='python')
print(f.readline())
print(f.readline())
print(data)
7,8
9,10
A B
0 1 2
1 3 4
2 5 6
Problem description
The Issue https://github.com/pandas-dev/pandas/issues/2071 is probably related. The c-parser exhaustes the file handler even if nrows is passed.
The python-parser shows unexpected behaviour, when nrows=1
or nrows=2
is given.
Expected Output
with open('example.csv') as f:
data = pd.read_csv(f, names=['A', 'B'], nrows=2, engine='python')
print(f.readline())
print(f.readline())
print(data)
5,6
7,8
A B
0 1 2
1 3 4
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.6.2.final.0 python-bits: 64 OS: Linux OS-release: 4.10.0-27-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.20.3 pytest: 3.1.2 pip: 9.0.1 setuptools: 27.2.0 Cython: 0.25.2 numpy: 1.13.1 scipy: 0.19.1 xarray: None IPython: 6.1.0 sphinx: 1.6.2 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: 3.3.0 numexpr: 2.6.2 feather: None matplotlib: 2.0.2 openpyxl: 2.4.0-b1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.8.0 bs4: 4.6.0 html5lib: 0.999 sqlalchemy: 1.1.11 pymysql: None psycopg2: None jinja2: 2.9.6 s3fs: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 6 years ago
- Reactions:4
- Comments:17 (12 by maintainers)
Top GitHub Comments
@andydish From what I could gather, the C implementation of
read_csv()
reads a chunk from the file into a buffer, than parses it and stops oncenrows
was reached. The user has no control over the chunk size - so the actual file pointer would move forward beyond thenrows
th line.The workaround I ended up doing was:
(not proud of it, but it worked…)
@igiloh That is where I was starting to head in my own solution. Thanks for sharing and saving me some time!