API: Standardize usage of underscores in multi-word kwargs
See original GitHub issueThis ticket is an outgrowth of a discussion in pull request #22587
By my rough count, the read_csv
method has nearly 50 keyword arguments.
Of those, 32 arguments are made up or two or more words. Twenty of those multi-word arguments use an underscore to mark the space between words, like skip_blank_lines
and parse_dates
. Twelve do not, like chunksize
and lineterminator
.
It is my opinion this is a small flaw in pandas’ API, and that the library would benefit by standardizing how spaces are handled. It would make pandas more legible and consistent, and therefore easier for users of all experience levels.
I have taught pandas to dozens of newbies across the country and I can testify from experience that small variations in the naming style of commonly used methods introduces unnecessary frustration, and can even reduce user confidence in the quality of the overall product.
As a frequent user of pandas, I can also attest that the inconsistencies require me, someone who uses the library daily, to routinely consult the documentation to ensure I use the proper kwarg naming style.
I am sympathetic to the desire to maintain backwards compatibility, which I believe could be managed with deprecation warnings that, if included, could be temporary, and ultimately removed in a future version, much in the way sort_values
was introduced.
Since the underscore method of handling word breaks is more common and more legible, I propose it be adopted. All existing multi-word arguments without an underscore would need to be modified. You can find an experimental patch of the skiprows
kwargs, and considerable support from other users for pursuing this type of change, in #22587.
If that pull request is ultimately merged, and the maintainers agree with the larger goal I’ve tried to articulate here, I would be pleased to lead an effort to expand whatever design pattern is agreed upon to other keyword arguments across the library.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:2
- Comments:28 (28 by maintainers)
Top GitHub Comments
I agree that underscore spacing for multi-word arguments is most pythonic and readable. Hopefully, this issue will help ensure new keyword arguments use underscore spacing.
Upgrading existing arguments will be painful, both in terms of developer time and the deprecation / backwards incompatibility issues that will arise. On the other hand, if these changes are going to be made, then sooner is better. I lean on the side of continually improving the Pandas design, since data science is a quickly-developing field.
It seems like disruptive changes like this would be most natural for a pandas overhaul like Pandas 2.0. However, it’s not clear whether Pandas 2 is an active proposal? If not, perhaps it makes sense to bite the bullet now on standardizing existing argument names.
Hi added PR #23158 in regards to deprecating
delimiter
on read_csv following previous discussion.