Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: support defaultdict in read_csv dtype parameter

See original GitHub issue

I have a large csv file with 15k columns with non-obvious names, of which 14990 are floating point numbers. I’d like to load them as floats without read_csv having to spend the time divining their types.

dtype allows providing a dict, but making one with all the column names is tedious and not always possible. The obvious solution is to provide a defaultdict, with a default of np.float32, and including entries for the other column types. Unfortunately currently, the default is silently ignored by read_csv. Presumably read_csv is not directly querying the dictionary, but rather checking first whether an item is there.

If this is not possible, it would be helpful to include a warning to the user, or at least some mention in the documentation, that defaultdict is not supported. It took me a long time to figure out why my floats weren’t being treated as float32 and why read_csv was still trying to determine the types of these columns.

Issue Analytics

State:
Created 2 years ago
Reactions:3
Comments:5 (2 by maintainers)

Top GitHub Comments

2reactions

mzeitlin11commented, May 21, 2021

Supporting defaultdict sounds reasonable, I think a PR to add this would be welcome if anyone’s interested in working on it!

0reactions

NickVeldcommented, May 25, 2021

Done. I used try/except approach because it is more general so it can be used latter for supporting

Read more comments on GitHub >

Top Results From Across the Web

Pandas read_csv dtype read all columns but few as string

Just noticed that for the 1.5 pandas release: "Support for defaultdict was added. Specify a defaultdict as input where the default determines ...

Pandas read_csv() tricks you should know to speed up your ...

Setting data type. If you want to set the data type for the DataFrame columns, you can use the argument dtype , for...

pandas read csv dtypes Code Example

pd.read_csv('data.csv') ... pandas read csv specify column dtype ... TypeError: argument of type 'WindowsPath' is not iterable · uuid regex ...

Release 0.17.1+0.g7f801adc.dirty Modin contributors

by Modin, so to avoid this issue, you need to set the dtype parameter of read_csv manually to force the correct data.

Advanced Python

from collections import defaultdict ddict = defaultdict(list) ... It also supports many SQL statements although its data types are more limited.

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

TYP: IntervalIndex.right/left might be infered as function

ENH: When chaining multiple .merge() functions, only the second "suffixes" param produces results