Cannot tell usecols to ignore missing columns
See original GitHub issueCode Sample
import pandas as pd
# Where example.csv is:
# column1,column2
# 1, 2
pd.read_csv('example.csv', usecols=['column1', 'column2', ' column3'])
Problem description
When specifying usecols to reduce the amount of data loaded, read_csv fails if the columns do not exist. This is not always desired, especially when reading a large number of files that may have varying columns.
There should be an option to suppress this and allow usecols to cut-down columns without enforcing their presence.
Current Output
ValueError: Usecols do not match columns in file, columns expected but not found: ['column3']
Expected Output
No error thrown where only some of the usecols exist.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.7.1.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None
pandas: 0.23.4 pytest: 4.0.2 pip: 18.1 setuptools: 40.6.3 Cython: 0.29.2 numpy: 1.15.4 scipy: 1.1.0 pyarrow: None xarray: None IPython: 7.2.0 sphinx: 1.8.2 patsy: 0.5.1 dateutil: 2.7.5 pytz: 2018.7 blosc: None bottleneck: 1.2.1 tables: 3.4.4 numexpr: 2.6.8 feather: None matplotlib: 3.0.2 openpyxl: 2.5.12 xlrd: 1.2.0 xlwt: 1.3.0 xlsxwriter: 1.1.2 lxml: 4.2.5 bs4: 4.6.3 html5lib: 1.0.1 sqlalchemy: 1.2.15 pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (4 by maintainers)
See the example here http://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#column-and-index-locations-and-names
I do think a callable to usecols is the right way to handle this -
read_csv
already has a ton of params and this is relatively simple to customize exactly how you want.see the docs: http://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#csv-text-files
you can pass a callable to
usecols
, IIRC @gfyoung we have an example of this somewhere?