Add key to sorting functions
See original GitHub issueMany python functions (sorting, max/min) accept a key argument, perhaps they could in pandas too.
.
The terrible motivating example was this awful hack from this question… for which maybe one could do
df.sort_index(key=lambda t: literal_eval(t[1:-1]))
This would still be an awful awful hack, but a slightly less awful one.
Issue Analytics
- State:
- Created 10 years ago
- Reactions:11
- Comments:20 (11 by maintainers)
Top Results From Across the Web
Sorting HOW TO — Python 3.11.1 documentation
Both list.sort() and sorted() have a key parameter to specify a function (or other callable) to be called on each list element prior...
Read more >How to make a Custom Sorting Function for Dictionary Key ...
First, sorting is done based upon the "4th" character in the keys. (that is, 1, 3, etc.) · Then sorting is done based...
Read more >Sorting a Python Dictionary: Values, Keys, and More
In this tutorial, you'll get the lowdown on sorting Python dictionaries. By the end, you'll be able to sort by key, value, or...
Read more >Python List Sort Key - Finxter
The list.sort() method takes another function as an optional key argument that allows you to modify the default sorting behavior. The key function...
Read more >Sorting Arrays - Manual - PHP
Sorting Arrays ¶ · Some sort based on the array keys, whereas others by the values: $array['key'] = 'value'; · Whether or not...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
As long as it is well documented how to use the key on multiple columns, I don’t much care. Just having the key option would be a huge step in the right direction.
Let me add a more concrete example of when having a
key
option to sort would be easier to use from a user’s perspective, and may possibly be more efficient thanCategoricals
.Suppose that a user had data in text files, and one of the columns contains distances with associated units, i.e. “45.232m” or “0.59472km”. Let’s say there are ~500,000 rows, and each has a different distance. Now, suppose the user wanted to sort based the data in this column. Obviously, they will have to do some sort of transformation of this data to make it sortable, since a purely ordinal sort will not work. As far as I can tell, currently the two most obvious results are to a) make a new column of the transformation result and use that column for sorting, or b) make the column a category, and then sort the data in the list, and make the categories the sorted data.
To me, neither seem entirely preferable because method 1 adds extra data to the DataFrame, which will take up space and require me to filter out later if I want to write out to file, and method 2 requires sorting all the data in my column before I can sort the data in my DataFrame, which unless I am mistaken is not incredibly efficient.
Things would be made worse if I then wanted to read in a second file and append that data to the DataFrame I already had, or if I wanted to modify the existing data in the “distances” column. I would then need to re-update my “distances_sort” column, or re-perform the
reorder_categories
call before I could sort again.If a
key
method were added tosort
, all the boilerplate goes away as well as the extra processing. Sorting would just becomeNow, no matter how I update or modify my distances column, I do not need to do any additional pre-processing before sorting.
The
key
argument could be flexible and support either a function, or adict
of functions. This second input type would be used if you wanted to provide a key for only a few columns, or different keys for different columns; for example: