Average impact
See original GitHub issueHey š Thank you again for this insightful project!
Iāve been contemplating usefulness of the āaverage impactā metric. As far as I understand from readme, originās impact is not aggregated at the website level e.g. if are two different scripts from ads.example
running on news.example
and each of them takes 50ms
of total CPU time, average is calculated as 50 + 50 / 2 requests
(=50ms
) and not 50 + 50 / 1 website
(=100ms
). Because of that, current method is rewarding origins that split their work between multiple scripts which doesnāt seem fair to me. Grouping execution time per origin (or better yet, tld or entity) at the website level would significantly change order of entities in the āThird Parties by Categoryā section [1].
WDYT about such change? Did I miss something in my deliberations? Is such grouping feasible with current dataset? Or would it make more sense to take a step back and create a new Lighthouse audit that groups total CPU time by entity (perhaps using entity data from this project)?
[1] Average number of scripts per website for chosen, high prevalence, origins (those numbers come form my own small crawl):
script.hotjar.com - 1 (avg per resource - 19ms, avg per site - 19ms) code.jquery.com - 1.14 www.googletagservices.com - 2.67 s7.addthis.com - 3.0 platform.twitter.com - 3.2 www.youtube.com - 3.38 t.sharethis.com - 5.05 www.facebook.com - 5.77 (avg per resource - 10.9ms, avg per site - 63ms)
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (5 by maintainers)
Top GitHub Comments
So would the aggregation be something like the median sum of execution times per page per entity?
Also FYI I added the map of domains -> entities to a BigQuery table in the HTTP Archive project: https://bigquery.cloud.google.com/table/httparchive:scratchspace.third_parties?pli=1&tab=preview. Itās not kept in sync with this repo but that might be a good way to do everything in SQL.
Thatād get us most of the way but thereās fancy root domain logic that happens in there too. I suppose we could just generate a dump of all observed origins -> entity name pre-resolved