Rework global user accuracy metric
See original GitHub issueMotivation
As per my understanding, currently the accuracy of a user is calculated as a weighted average of the accuracies of the top plays, using the same weighting system pp does. This concept has several issues:
- The average is not a robust measure of central tendency.
This means that an outlier accuracy reaching the top plays would affect heavily the resulting value, either boosting it or reducing it sharply.
- The weighting system heavily biases the result towards the accuracy of the very top pp plays. However, the relationship of the pp gained from a play with the difficulty of the map is non-trivial, rendering the weights used questionable.
After successfully passing a map beyond the player’s current comfort zone, it could turn out to be a top-pp play with low accuracy. In this situation which is supposed to be joyful for the player, the global accuracy of the user will drop sharply, resulting in a bad™ user experience.
- It’s hard to leverage to compare the skill of multiple players.
Example: if player A has 98%
accuracy and player B has 95%
accuracy, who is better given that player A is at 500pp
and player B at 800pp
? The higher accuracy of player A sheds a shadow on the higher ranking of player B, because player A could “sacrifice” his accuracy by playing harder maps in order to get PP. However, in the end the interpretation of the higher accuracy is just a suspicion, and not an actual reliable conclusion.
Proposal
I have three proposals to fix these three issues. Each of them more or less builds upon the previous one.
-
Switch the calculation from using a weighted average to using the median of the relevant accuracies, from the top pp plays (the median is a robust measure of central tendency.)
-
The displayed user accuracy should be a compound metric of 2 (or more) values which represents the performance of the user over different map difficulty brackets.
In particular, I am proposing to split the top pp plays in two sets, according to the maximum pp obtainable (or star rating?) on each map (with the respective used mods).
Hence, the accuracy would be displayed as a ➜ b
, where a
is the median of the accuracies of the plays of easiest maps, and b
is the median of the accuracies of the plays of the hardest maps. Note that the value a
would represent the accuracy for maps within the player’s comfort zone, while b
would represent the accuracy for maps which challenge the player’s skill.
In this context, a ➜ b
is to be read roughly as “a accuracy in comfortable maps, with b accuracy in harder maps”.
- Establish manageable pp cutoffs, such as
100 * 2^n
. Then, the accuracy becomes a list of medians for each pp range associated to map difficulty. In most places where it would be displayed the accuracy would be shown as just the two medians of the highest difficulty sets. However, there would be a place in the user profile where the full value would be listed in order to compare several players.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:15 (5 by maintainers)
Top GitHub Comments
The arithmetic mean is too susceptible to outlier values. Let’s show this with in an example:
Example
New player installed osu! and played a low-end Hard map, and got 95%. "too ez", he says, I'll play Insane.New player plays 5 Insane maps, getting accuracy of ~70%.
Later on, in social networks:
The value of 74.1% above was calculated with an unweighted average (5*70+95)/6. For comparison, the median would be 70%. When they play together, the median value turns out to be a much better predictor of their typical accuracy. This is the large effect of the outlier 95% accuracy from the first easy game.
tl;dr: the average is too vulnerable to outlier values. This issue is exacerbated by the current weighting system used, but in the end it’s intrinsic to the arithmetic mean.
Yup, I forgot to factor in misses. I updated the code in the demo linked, but fixing the table will have to wait until tomorrow.