question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: allow lists/sets in fillna

See original GitHub issue

The docs for the value-parameter of fillna say (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html#pandas.DataFrame.fillna)

value : scalar, dict, Series, or DataFrame Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series/DataFrame will not be filled). This value cannot be a list. [my bold]

Frankly, I do not understand this limitation, especially because I see no way to interpret it ambiguously. There are several usecases (that I keep encountering) for filling in lists - especially empty lists.

One of the main ones in using .str.split(..., expand=False) or str.extract and wanting to keep processing the lists - e.g. turn them into sets:

s = pd.Series(['a,b,b,c', 'b,c,d,d,d', 1, None]).str.split(',')
s
# 0       [a, b, b, c]
# 1    [b, c, d, d, d]
# 2                NaN
# 3                NaN
# dtype: object

s.map(set)  # this errors on NaNs
# TypeError: 'float' object is not iterable

### would like to use:
s.fillna([]).map(set)
# TypeError: "value" parameter must be a scalar or dict, but you passed a "list"

### same for
s.fillna(set()).map(set)
# TypeError: "value" parameter must be a scalar or dict, but you passed a "set"

It’s tedious to always do this by hand (esp. for DataFrame), like

serieswithalongname.loc[serieswithalongname.isnull()] = [] # can't be chained either

### even this doesn't work, because pd.notnull retains dimensionality -> ambiguous boolean
s.map(lambda x: set(x) if pd.notnull(x) else set())
# ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

### meaning the next step gets even more unwieldy:
s.map(lambda x: set() if isinstance(x, (float, type(None))) and pd.isnull(x) else set(x))
# 0    {c, a, b}
# 1    {c, d, b}
# 2           {}
# 3           {}
# dtype: object

### work-around even more painful for DataFrames,
### because single-value broadcast doesn't work as before
df = s.to_frame('A')
df.loc[df.A.isnull(), 'A'] = []
# ValueError: cannot copy sequence with size 0 to array axis with dimension 1

### need to know to use this:
df.loc[df.A.isnull(), 'A'] = [[]]
df.applymap(set)
#            A
# 0  {c, a, b}
# 1  {c, d, b}
# 2         {}
# 3         {}

This is also somewhat related to #19266.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:3
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
h-vetinaricommented, Jun 7, 2018

@TomAugspurger

Thanks for the input. I think this ambiguity is not so serious. First off self[i] for i in range(len(value)) does not work for a DF, but more importantly, one can immediately see the effect that the whole list gets filled into every NaN. And with a quick look at the docs, it’s clear that a dictionary is needed to distinguish fill-values by column.

Furthermore, since lists throw errors currently, allowing this would not break any existing code. I think it would be very useful…

0reactions
goerlitzcommented, Jun 11, 2020

I often have to deal with data where a table column contains a string with comma-separated values or NAs. So having a fix for this issue would be highly appreciated!

I painfully realized that

pd.Series(['a,b,b,c', None]).str.split(',').fillna([])

throws an error and

pd.Series(['a,b,b,c', None]).fillna('').str.split(',')

returns a list with an empty string but not an empty list. 😦

So I ended up with

pd.Series(['a,b,b,c', None]).str.split(',').map(lambda x: x if isinstance(x, list) else [])

which is not pretty but does the job.

I don’t know if this has a decent performance - and I don’t really care because all I need is to declare several data transformation in a chained fashion where each transformation can be easily expressed on a single line (without defining several other helper functions or whatsoever).

Read more comments on GitHub >

github_iconTop Results From Across the Web

pandas.Series.fillna — pandas 1.5.2 documentation
Fill NA /NaN values using the specified method. Parameters. valuescalar, dict, Series, or DataFrame. Value to use to fill holes (e.g. 0), ...
Read more >
How to Use the Pandas fillna Method - Sharp Sight
Very simply, the Pandas fillna method fills in missing values in Pandas dataframes. That said, it helps to give a little context, so...
Read more >
How to Pandas fillna() with mode of column? - Stack Overflow
Just call first element of series: data['Native Country'].fillna(data['Native Country'].mode()[0], inplace=True). or you can do the same with assisgnment:
Read more >
Python | Pandas DataFrame.fillna() to replace Null values in ...
Just like pandas dropna() method manage and remove Null values from a data frame, fillna() manages and let the user replace NaN values...
Read more >
Pandas Fillna - Dealing with Missing Values - Datagy
The Pandas FillNa function allows you to fill missing values, with specifc values, previous values (back fill), and other computed values.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found