2D Boxplots
See original GitHub issuePlease:
- Check for duplicate issues. Please file separate requests as separate issues on GitHub.
Verified existing issues and this enhancement request is not there. - Describe the feature’s goal, motivating use cases, and its expected behavior. The Boxplots we’re familiar with, visualize 1D (one-dimensional) distributions. What this means, is that these charts display the distribution of data on a single scale/axis horizontal or vertical. However, there are a series of Boxplot variations that can display the distribution of data over two, three or even more dimensions. 2D boxplot uses both x and y axis to plot two set of data. We can use 2 dimensional boxplots to represent data variations of 2 axis in conjunction with each other. This is where having 2D boxplots as inbuilt Vega lite mark will be so useful. Please refer below link for the reference. Here idea is to create 2D boxplot but using same traditional boxes http://datavizcatalogue.com/blog/multidimensional-boxplot-variations/
- If you are proposing a new syntax, please provide at least one example spec, wrapped by triple backticks like this: NA, will use existing type script syntax for developing new 2D boxplot mark
{
"mark": "point",
"encoding": {"x": {"field": "a"}}
}
You are encouraged to prototype multiple alternative syntaxes for your proposed feature. Doing so often leads to a better design.
- If applicable, include screenshots, GIF videos (e.g. using https://www.cockos.com/licecap/), or working example (e.g. example Vega specification for the requested feature)
Please refer below sample example for developing 2 Dimensional boxplots.
timeseries_values = [8.894, 15.023, 8.605, 8.278, 12.224]
{median, {q1, q3}, iqr, {lower_whiskers, upper_whiskers}, outliers} =
timeseries_values |> BoxplotStats.stats()
data_values = [25, 13, 22, 30, 60]
{data_median, {data_q1, data_q3}, data_iqr, {data_lower_whiskers, data_upper_whiskers}, data_outliers} =
data_values |> BoxplotStats.stats()
data = [
%{
"event" => "AAA",
"timeseries_median" => median,
"timeseries_q1" => q1,
"timeseries_q3" => q3,
"timeseries_iqr" => iqr,
"timeseries_lo_whisker" => lower_whiskers,
"timeseries_up_whisker" => upper_whiskers,
"timeseries_lo_outlier" => 1,
"timeseries_up_outlier" => 20,
"data_median" => data_median,
"data_q1" => data_q1,
"data_q3" => data_q3,
"data_iqr" => data_iqr,
"data_lo_whisker" => data_lower_whiskers,
"data_up_whisker" => data_upper_whiskers,
"data_lo_outlier" => 8,
"data_up_outlier" => 50
}
]
data |> inspect |> IO.puts()
Vl.new(height: 480, width: 500, title: "Composite Boxplot W Timeseries execution & coresponding data values")
|> Vl.layers([
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:bar, tooltip: true)
|> Vl.encode(:size, value: 20)
|> Vl.encode_field(:x, "timeseries_q1", type: :quantitative, title: "TimeSeries")
|> Vl.encode_field(:x2, "timeseries_q3")
|> Vl.encode_field(:y, "data_q1", type: :quantitative, title: "Data")
|> Vl.encode_field(:y2, "data_q3"),
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:rule, color: :white, tooltip: true)
|> Vl.encode_field(:x, "timeseries_median", type: :quantitative)
|> Vl.encode_field(:y, "data_q1", type: :quantitative)
|> Vl.encode_field(:y2, "data_q3", type: :quantitative),
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:rule, color: :white, tooltip: true)
|> Vl.encode_field(:y, "data_median", type: :quantitative)
|> Vl.encode_field(:x, "timeseries_q1", type: :quantitative)
|> Vl.encode_field(:x2, "timeseries_q3", type: :quantitative),
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:rule, size: 2, ticks: true, tooltip: true)
|> Vl.encode_field(:x, "timeseries_q1", type: :quantitative)
|> Vl.encode_field(:x2, "timeseries_lo_whisker")
|> Vl.encode_field(:y, "data_median", type: :quantitative),
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:rule, size: 2, ticks: true, tooltip: true)
|> Vl.encode_field(:x, "timeseries_q3", type: :quantitative)
|> Vl.encode_field(:x2, "timeseries_up_whisker")
|> Vl.encode_field(:y, "data_median", type: :quantitative),
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:rule, size: 2, ticks: true, tooltip: true)
|> Vl.encode_field(:y, "data_q1", type: :quantitative)
|> Vl.encode_field(:y2, "data_lo_whisker")
|> Vl.encode_field(:x, "timeseries_median", type: :quantitative),
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:rule, size: 2, ticks: true, tooltip: true)
|> Vl.encode_field(:y, "data_q3", type: :quantitative)
|> Vl.encode_field(:y2, "data_up_whisker")
|> Vl.encode_field(:x, "timeseries_median", type: :quantitative),
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:tick, color: :black, size: 16, thickness: 2, tooltip: true)
|> Vl.encode_field(:x, "timeseries_lo_whisker", type: :quantitative),
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:tick, color: :black, size: 16, thickness: 2, tooltip: true)
|> Vl.encode_field(:x, "timeseries_up_whisker", type: :quantitative),
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:tick, color: :black, size: 16, thickness: 2, tooltip: true, orient: :horizontal)
|> Vl.encode_field(:y, "data_lo_whisker", type: :quantitative)
|> Vl.encode_field(:x, "timeseries_median", type: :quantitative),
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:tick, color: :black, size: 16, thickness: 2, tooltip: true, orient: :horizontal)
|> Vl.encode_field(:y, "data_up_whisker", type: :quantitative)
|> Vl.encode_field(:x, "timeseries_median", type: :quantitative),
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:point, color: :red, tooltip: true)
|> Vl.encode_field(:x, "timeseries_lo_outlier", type: :quantitative)
|> Vl.encode_field(:y, "data_median", type: :quantitative),
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:point, color: :red, tooltip: true)
|> Vl.encode_field(:x, "timeseries_up_outlier", type: :quantitative)
|> Vl.encode_field(:y, "data_median", type: :quantitative),
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:point, color: :red, tooltip: true)
|> Vl.encode_field(:y, "data_lo_outlier", type: :quantitative)
|> Vl.encode_field(:x, "timeseries_median", type: :quantitative),
Vl.new()
|> Vl.data_from_values(data)
|> Vl.mark(:point, color: :red, tooltip: true)
|> Vl.encode_field(:y, "data_up_outlier", type: :quantitative)
|> Vl.encode_field(:x, "timeseries_median", type: :quantitative)
])
Please refer below screen shot for the sample 2D boxplot which we are proposing as an enhancement:
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (5 by maintainers)
Top Results From Across the Web
Further Exploration #5 Multidimensional Boxplot Variations
The two-dimensional boxplot visualises the location, spread, correlation and skewness of the data. – Two-dimensional box plot, P. Tongkumchum.
Read more >Drawing 2D boxplots with R (ggplot) - Stack Overflow
1 Answer 1 · 1 · Then i guess the only way it could be done is precalculating quantiles of both variables and...
Read more >2D Box Plots - TIBCO Software
In box plots, ranges of values of a selected variable (or variables) are plotted separately for groups of cases defined by values of...
Read more >Boxplots — Matplotlib 3.6.2 documentation
The following examples show off how to visualize boxplots with Matplotlib. There are many options to control their appearance and the statistics that...
Read more >Boxplots - Quick-R
Learn how to create boxplots in R for individual variables or for variables by group. ... Bagplot - A 2D Boxplot Extension.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks for more explanation.
I think everyone would agree that standard (1D) boxplots are useful despite its drawbacks.
However, my problem with variants of 2D boxplots is that there doesn’t seem to be a universally accepted one.
There are many variants, but it’s unclear which one is the one worth implementing.
I strongly think that it’s probably better if the effort is spent on bring contour plots, which seems to be more widely used, to Vega-Lite.
Note that I’m totally ok if you want to add some variants of these 2D boxplots as Vega or Vega Lite examples. They could be useful for people who may prefer specific variants of 2D box plots. However, to say that we support them as built-in boxplot type, I’d like to see a more concrete proposal for which 2D boxplot to implement, and why we should prioritize such chart type over a contour plot.
Hi! There are a couple of us working on this and I would like to emphasize the point that the goal is to come up with a visualization that helps understand distribution & variation in 2d.
So, we provided the link to some examples of using area in 2d to show aggregates; ex: bagplot. But, there does not seem to be a “go to” or standard chart for distribution in 2d. The above reply says that instead of creating a new visualization we feel that if boxplots are still an acceptable aggregate visualization, then extending them to 2d should be preferable to coming up with a new one; we think it would be a useful addition to the examples given in the Distributions section of: https://vega.github.io/vega/examples/
Also, to describe the example/data we are working with a bit: Imagine we have processes that start and stop during some timeframe. During this time we have a 2nd set of data (in the example it is CPU utilization). What we want to show is that for many instances of these processes and their attendant 2nd measure (CPU%), what is the distribution of start/end and 2nd measure (CPU%) variation. So, the example above shows that for Process A its start/end IRQ is ~3seconds to ~12.5seconds and the CPU utilization IRQ during this timeframe is ~19% to almost 60%. Now, imagine adding additional processes to this chart or faceting per process, both of which we have implemented. After visualizing multiple Processes and their 2nd measure (CPU%), we can see that we should wonder about those Process with a “large” area: These are Processes with large variabilities in duration and utilization, why?
Im sure we could have a fantastic discussion about other visualizations for this, but hopefully here we have given some reasonable justification for 2d boxplots (even if ultimately not accepted! 😃 ).