ENH: SortField shorthand
See original GitHub issueYesterday, a colleague asked me how to dictate the sort of bars in a chart. I developed this example to show him how.
alt.Chart(df, title="Median household income of U.S. counties").mark_bar().encode(
x=alt.X(
"name:N",
axis=alt.Axis(labels=False, title="", ticks=False),
# Here's where you can resort the order of the columns on the x-axis
sort=alt.SortField(
# This SortField class requires at least three inputs,
# which does seem like overkill. I'd like to see a simpler
# way to pull this off.
field='b19013001', # First the field you want to sort on
op='sum', # Then the operation to run on that field. In this case, we just total the value.
order="descending" # Finally, the order to sort.
)
),
y=alt.Y(
"b19013001:Q",
axis=alt.Axis(title="", format="$s", ticks=False)
)
).properties(width=620)
It works great but, IMHO, the SortField
requirement with three inputs, including a “fake” op
that in this case does not appear to be necessary, is asking a lot of beginners. And I’d like to think something more convenient could also benefit experts.
I know nothing about the internals of this feature, but I’m curious if the sort
channel could somehow benefit from a shorthand, much like the x
and y
channels.
In my imagination, something like this:
sort=alt.SortField(field="b19013001", op="sum", ordering="descending")
Could be submitted like this, with the field and operation handled much like the other shorthand features, and the descending order of the sort handled with the same style as the order_by method of the popular Django framework:
sort="-sum(b19013001)"
I’m guessing you can easily imagine the other permutations in this kind of scheme. Additionally in cases where the dataframe is not grouped during encoding, it seems to me that providing the op
argument should be, 🥁, optional. That would mean that if a field was to be used as the sort in ascending order with no aggregation, the shorthand submission could be as simple as:
sort="b19013001"
What do you think? If something like this already exists and I’m simply ignorant of it I will accept writing the documentation as my punishment.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:12 (8 by maintainers)
Top GitHub Comments
I wouldn’t say they’re off-limits… I’d just say we need to think carefully about where to draw the line on what parts of Altair exactly mirror the Vega-Lite API and what parts diverge.
Just for background: the way the shorthand expressions work is:
to_dict()
method so that it detects the presence of this attribute, removes it, and interprets its contents into a form that is valid according to the schema (in this case, populating thefield
,type
,aggregate
, andtimeUnit
attributes).This customized code depends on the details of the schema, and so when the schema is updated the details of these modifications have to be updated as well. For example, the Vega-Lite version 1 and Vega-Lite version 2 schemas were so different that it required essentially rewriting the code from scratch, which all told took about 8 months to really get correct. Along the way, I dropped a number of other API shortcuts we had created earlier because I saw how unmaintainable they were when it came to schema updates.
I think overall it’s good to have those encoding shorthands available at the top level of the encoding… it’s something that’s used in basically every chart, and so the added maintenance burden is worth it. For any other API changes that require circumventing the grammar of the Vega-Lite schema, I want to make sure we’re carefully weighing the benefit to users vs the costs of the new maintenance burdens they create.
So no, nothing’s off-limits per se, but there’s a lot to keep in mind when making these kinds of decisions.
I think it’s worth knowing what are the things that Altair still diverges from Vega-Lite, so we can revise our defaults, esp. for the upcoming VL4.
Yep, I have an issue that you can upvote in VL here: https://github.com/vega/vega-lite/issues/4933.