question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: support defaultdict in read_csv dtype parameter

See original GitHub issue

I have a large csv file with 15k columns with non-obvious names, of which 14990 are floating point numbers. I’d like to load them as floats without read_csv having to spend the time divining their types.

dtype allows providing a dict, but making one with all the column names is tedious and not always possible. The obvious solution is to provide a defaultdict, with a default of np.float32, and including entries for the other column types. Unfortunately currently, the default is silently ignored by read_csv. Presumably read_csv is not directly querying the dictionary, but rather checking first whether an item is there.

If this is not possible, it would be helpful to include a warning to the user, or at least some mention in the documentation, that defaultdict is not supported. It took me a long time to figure out why my floats weren’t being treated as float32 and why read_csv was still trying to determine the types of these columns.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:3
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
mzeitlin11commented, May 21, 2021

Supporting defaultdict sounds reasonable, I think a PR to add this would be welcome if anyone’s interested in working on it!

0reactions
NickVeldcommented, May 25, 2021

Done. I used try/except approach because it is more general so it can be used latter for supporting

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas read_csv dtype read all columns but few as string
Just noticed that for the 1.5 pandas release: "Support for defaultdict was added. Specify a defaultdict as input where the default determines ...
Read more >
Pandas read_csv() tricks you should know to speed up your ...
Setting data type. If you want to set the data type for the DataFrame columns, you can use the argument dtype , for...
Read more >
pandas read csv dtypes Code Example
pd.read_csv('data.csv') ... pandas read csv specify column dtype ... TypeError: argument of type 'WindowsPath' is not iterable · uuid regex ...
Read more >
Release 0.17.1+0.g7f801adc.dirty Modin contributors
by Modin, so to avoid this issue, you need to set the dtype parameter of read_csv manually to force the correct data.
Read more >
Advanced Python
from collections import defaultdict ddict = defaultdict(list) ... It also supports many SQL statements although its data types are more limited.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found