Make `sort.op` optional
See original GitHub issueCrossposting from vega/vega-lite#1489
Why is the op
keyword on sort required?
Here’s an example Jupyter Notebook using Vega via Altair where I see the requirement causing a problem.
The user has a simple table with one nominal value and one quantitative value per row, in this case the median income of each county in the United States.
Each row is encoded into a bar on a chart. The user would like to sort the nominal bars using the y-axis’ quantitative value with no transformation or aggregation.
To the user in that circumstance, it seems extraneous to have to submit any operation at all.
alt.Chart(df, title="Median household income of U.S. counties").mark_bar().encode(
x=alt.X(
"name:N",
axis=alt.Axis(labels=False, title="", ticks=False),
sort=alt.SortField(
field='b19013001',
op='sum', # <-- Why is this necessary?
order="descending"
)
),
y=alt.Y(
"b19013001:Q",
axis=alt.Axis(title="", format="$s", ticks=False)
)
).properties(width=620)
If the chart is not aggregated, why should the user have to specify an aggregation?
Am I crazy to think that a sensible default would be that if no aggregation function is provided Vega should assume there in a 1:1 relationship between the axis and the sort, perhaps raising an error if there isn’t?
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
There has to be some aggregate here, as there is no guarantee that there are not multiple records for values in the scale domain. So the question becomes: should there be a default aggregate operation and if so what should it be? Min? Max? Something else?
The “virtue” of including the op is that (1) it makes it clear what is being done, hopefully preventing future confusion at the cost of some upfront learning, (2) while options like “sum” and “average” are only applicable to numeric values (and so not suitable as default operations), they permit more efficient streaming operations than “min” or “max”, thus enabling more performant visualizations. (To deal with possible value removals, min/max must keep a list of all data records seen.)
As a result of the above I’m inclined to keep the design as-is, though I welcome more discussion. Another option is for Vega-Lite to make it’s own decision here, and keep Vega as-is regardless.
The documentation should mention that if you just want to do a simple sort on some quantitative value, just use the
sum
aggregate.