Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Why is dropna default value is True in value_counts() methods/functions ?

See original GitHub issue

Problem description

>>> s = pd.Series([1,2,3, np.nan, 5])
>>> s.value_counts()
5    1
3    1
2    1
1    1

>>> s.value_counts(dropna = False)

 5     1
 3     1
 2     1
 1     1
NaN    1

For beginner in pandas, it can be puzzling and misleading to do not see NaNs values when trying to understand a DataFrame, a Series, etc. Especially if value_counts is used to check that previous operations were made in a correct way (e.g. join / merge-type operations).

As I can understand that it may seems natural to drop NaNs for various operations in pandas (means, etc), and as a consequence of that, the general default value for dropna arguments is True (is it really the real reason?).

I feel uncomfortable with the value_counts default behavior and it has (and still) caused me some troubles.

The zen of python second aphorism state:

Explicit is better than implicit

I do feel that dropping NaN value is done in a implicit way, that this implicit way is harmful. If find no drawbacks to have False as default value, to the exception of having a Na in Series index.

The question:

So why is dropna arguments default value of value_counts() are True ?

Ps : I’ve looked into issues with filters : is:issue value_counts dropna to the exception of https://github.com/pandas-dev/pandas/issues/5569 I didn’t find a lot of informations.

Issue Analytics

State:
Created 5 years ago
Reactions:2
Comments:6 (5 by maintainers)

Top GitHub Comments

2reactions

mqkcommented, Jul 6, 2021

👍 to this (ancient) question, and one vote from me for changing the default to dropna=False. I add that keyword pretty much every single time I call value_counts…

0reactions

nmusolinocommented, Jul 6, 2021

@nmusolino , I’m a bit troubled, what do you really mean ?

[…]

I’m not sure to fully get the arguments that are in favour of dropping NaNs…

Sorry, I had a typo in my old comment, which I’ve just corrected. I agree with the original issue description: the default should be to include null values (dropna=False).