Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DZL Proposal and Variance Measurement

See original GitHub issue

I’ve been doing some thinking and background research on what’s out there. I’m taking down my thoughts here as a sounding board so we can narrow down to an MVP.

The Problem

LH team needs to be able to understand metric variance and overall runtime/latency (bonus if possible: accuracy) in different environments, how changes we make are affecting these attributes, and how we are trending over time.

Recap

Need to monitor:

Metric variance
Overall LH runtime
(maybe if possible) Accuracy to real phones

Across:

Environment (i.e. LR vs. local vs. travis/cloud/whatever)
Different site types/URLs (i.e. example.com-type vs. cnn.com-type)
Throttling types (i.e. Lantern vs. DevTools vs. WPT vs. none)

Use Cases:

Overall “health dashboard” i.e. what does master look like overall?
Compare version A to version B i.e. does this change improve LH?
Timeline view by commit i.e. are we suffering from a death by a thousand cuts over time?

Potential Solution Components

Mechanism for running LH n times in a particular environment on given URLs and storing the results in some queryable format
Mechanism for visualizing all the latest master results
Mechanism for visualizing the difference between two different versions of LH
Mechanism for visualizing the history of master results

Existing Solutions

The good news: we have an awesome community that has built lots of things to look at LH results over time 😃 The bad news: their big selling points usually revolve around ease of time series data and abstracting away the environment concerns (which is the one piece we will actually need to change up and have control over the most) 😕

Only one of the use cases here is really a timeseries problem (and even then it’s not a real-time timeseries, it’s commit level timeseries). That’s not to say we can’t repurpose a timeseries DB for our use cases, graphana still supports histograms and all that, it just is a bit of shoehorn for some of the things we’ll want to do.

Other problem, one of the big things we actually care about most in all of this is differences between versions of Lighthouse. Given that abstracting the environment away and keeping it stable is a selling point of all these solutions, breaking in to make comparing versions our priority is really cutting against the grain. Again, not impossible, but not exactly leveraging the strengths of these solutions.

Proposed MVP

K-I-S-S, keep it simple stupid. Great advice, hurts my feelings every time.

Simple CLI with 2 commands.

run - handle the run n times and save piece, single js file for each connector we need to run, just local and LR to start
serve - serve a site that enables the visualization pieces

These two commands share a CLI config that specifies storage location. I’m thinking sqlite to start to avoid any crazy docker mess and work with some hypothetical remote SQL server. We can include a field for the raw response so we can always add more columns easily later.

Thoughts so far? Did I completely miss what the pain is from others’ perspective? Does it sound terrifyingly similar to plots 😱

Issue Analytics

State:
Created 5 years ago
Comments:19 (4 by maintainers)

Top GitHub Comments

2reactions

exterkampcommented, Oct 13, 2018

Hey, so yeah, I was thinking that I want to see data commit-over-commit so that I could see if a specific commit has introduced a problem, or that we can see that a variance has been reduced.

I am still liking the idea of a candlestick graph with the data like this: candlestick_chart

x-axis: commit hash identifying the point in time the run was, maybe even LH version might be better, or master commit hash, or varying resolution might be nice
each candle would be derived from all runs for that commit hash. I imagine making the candle from taking all the runs and finding 1 std dev from the mean and plotting that
- “open”: mean minus 1 std dev
- “close”: mean plus 1 std dev
- “high”: the highest value of the metric for any of the runs in that commit
- “low”: the lowest value of the metric for any of the runs in that commit

This would allow us to visualize when the variance was narrowing i.e. the std dev would be going down over time: variance_down

Or if a specific commit increased variance: increase_variance

So that is kind of how I like to visualize the scores over time, either with a candlestick chart, or with a line chart + shaded area of +/- 1-2 std dev around it to show the variance.

I like the current visualizations esp. broken down by URL. But personally I want to see line charts/candlestick charts that show me what each metric is doing over time so that I can see if something is getting out of hand over time or degrading slowly. But for snapshots I like all the called out percentages and variance boxes coded red/yellow/green.

hey @patrickhulce have you looked into Superset (Disclaimer: I used to work on the airflow DAG ingestion and viz on a tool that used Superset so I like it and it’s python)

Made some candles with some of the dumped data to show what variance in duration of run could look like over multiple commits in candle form. candles

1reaction

paulirishcommented, Oct 9, 2018

OK. I think i’m sufficiently convinced that these tools optimize for metrics happening continuously and requiring grouping. Our usecase of discrete metrics every hour or day isn’t supported well by them. (your 3rd bold point).

I appreciate the attempt to make it work, but agree that grafana isn’t a great solution for what we’re trying to do here.

Top Results From Across the Web

Mixtures of Discrete and Continuous Variables

When the component distributions of the mixture have different variances ... we propose approximating the mixture density with a non-mixture density from ...

Variance reduction for MCMC methods via martingale ... - ArXiv

Abstract: In this paper we propose an efficient variance reduction ap- proach for MCMC algorithms relying on a novel discrete time martingale.

Variation of Densitometry on Computed Tomography in COPD

Aim of this study was to evaluate the user-independent measurement variability between five different fully-automatic densitometry software ...

The perfect qMR machine: Measurement variance much less ...

This is achieved by making the measurement repeatability variance much less than the biological variance. Thus the proposal is: A Perfect Quantitative MR ......

Precision measurement and fundamental constants - GovInfo

A 7p' Determination at the ETL; Two Proposals for Obtaining a Uniform Calculable Magnetic Field ... sample variance [3] as a function of...