ENH: support defaultdict in read_csv dtype parameter
See original GitHub issueI have a large csv file with 15k columns with non-obvious names, of which 14990 are floating point numbers. I’d like to load them as floats without read_csv
having to spend the time divining their types.
dtype
allows providing a dict
, but making one with all the column names is tedious and not always possible. The obvious solution is to provide a defaultdict
, with a default of np.float32
, and including entries for the other column types. Unfortunately currently, the default is silently ignored by read_csv
. Presumably read_csv
is not directly querying the dictionary, but rather checking first whether an item is there.
If this is not possible, it would be helpful to include a warning to the user, or at least some mention in the documentation, that defaultdict is not supported. It took me a long time to figure out why my floats weren’t being treated as float32 and why read_csv was still trying to determine the types of these columns.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:3
- Comments:5 (2 by maintainers)
Supporting
defaultdict
sounds reasonable, I think a PR to add this would be welcome if anyone’s interested in working on it!Done. I used try/except approach because it is more general so it can be used latter for supporting