question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

API: read_csv parsing of quoted "NA" value as NaN

See original GitHub issue

From https://github.com/pandas-dev/pandas/issues/10647#issuecomment-123715600

In context of discussion on the default parsing of ‘NA’ in a csv file as NaN, I don’t think we can change the default of parsing NA, but we may consider parsing of "NA" as a string instead of NaN:

Another change that we would maybe be more likely to consider, is the parsing of quoted values of NA (so only changing “NA” not be converted automatically, but leaving NA and alike converted to NaN as it is now). Personally, I would even this consider as a bug that it treats “NA” and NA the same, as I would expect that quoted values should be left untouched. But I don’t know how long this behaviour has been this way.

Although I am not sure how invasive such a change would be, and difficult to assess, so maybe this is not worth risking many breakages?

Issue Analytics

  • State:open
  • Created 7 years ago
  • Reactions:1
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
HHestcommented, Nov 29, 2017

Example:

pd.read_csv(StringIO('a,b\nNA,NA\n0123,0123'), dtype={'a': str})
Out[1]: 
      a      b
0   NaN    NaN
1  0123  123.0

Would love to see column ‘a’ showing “NA, 0123” as strings.

This is a very typical problem for us, because we have so many “exchange code” = “NA”. Currently we need to use keep_default_na=False, and roll our own na_values, which is cumbersome and error prone.

For my situation, I find it counterproductive that a column specified as string still is vetted for NaN. On the other hand, I can imagine some users wanting this vetting for “NaN” to happen even for strings because they want to encode “N/A” responses. Perhaps the NaN vetting process could be controlled per column? Perhaps by allowing na_filter to take a dictionary of {col_name: boolean}.

0reactions
patricktokeeffecommented, Jun 11, 2019

In context of discussion on the default parsing of ‘NA’ in a csv file as NaN, I don’t think we can change the default of parsing NA, but we may consider parsing of “NA” as a string instead of NaN:

fwiw, this may not be possible without changing the csv parser behavior: I use read_csv to import files where the NA value is literally "NAN" but I have to specify NAN to na_values parameter because double-quotes are automatically consumed.

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Get pandas.read_csv to read empty fields as NaN ...
When reading the new data file with quoting off, empty values are NaN and empty strings are two quotes. This dataframe then can...
Read more >
pandas.read_csv — pandas 1.5.2 documentation
Read a comma-separated values (csv) file into DataFrame. ... Whether or not to include the default NaN values when parsing the data.
Read more >
CSV Files - Spark 3.3.1 Documentation
Property Name Default Scope sep, read/write encoding UTF‑8 read/write quote " read/write
Read more >
Reading - CSV.jl
Argument to control how missing values are handled while parsing input data. ... If you happen to know a file has no quoted...
Read more >
Reading and Writing CSV Files in Python - Real Python
Optional Python CSV reader Parameters · Use a different delimiter. That way, the comma can safely be used in the data itself. ·...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found