Why is dropna default value is True in value_counts() methods/functions ?
See original GitHub issueProblem description
>>> s = pd.Series([1,2,3, np.nan, 5])
>>> s.value_counts()
5 1
3 1
2 1
1 1
>>> s.value_counts(dropna = False)
5 1
3 1
2 1
1 1
NaN 1
For beginner in pandas, it can be puzzling and misleading to do not see NaNs values when trying to understand a DataFrame, a Series, etc. Especially if value_counts
is used to check that previous operations were made in a correct way (e.g. join / merge-type operations).
As I can understand that it may seems natural to drop NaNs
for various operations in pandas (means, etc), and as a consequence of that, the general default value for dropna
arguments is True
(is it really the real reason?).
I feel uncomfortable with the value_counts default behavior and it has (and still) caused me some troubles.
The zen of python second aphorism state:
Explicit is better than implicit
I do feel that dropping NaN value is done in a implicit way, that this implicit way is harmful.
If find no drawbacks to have False
as default value, to the exception of having a Na in Series
index
.
The question:
So why is dropna
arguments default value of value_counts()
are True
?
Ps : Iāve looked into issues with filters : is:issue value_counts dropna
to the exception of https://github.com/pandas-dev/pandas/issues/5569 I didnāt find a lot of informations.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:6 (5 by maintainers)
Top GitHub Comments
š to this (ancient) question, and one vote from me for changing the default to
dropna=False
. I add that keyword pretty much every single time I callvalue_counts
ā¦Sorry, I had a typo in my old comment, which Iāve just corrected. I agree with the original issue description: the default should be to include null values (
dropna=False
).