question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Recreating a histogram from the percentile distribution output provides same percentiles, but different summary (mean, max, stddev)

See original GitHub issue

I have a scenario where I’m using HDRHistogram and it’s .toJSON() as a standard interface between several different tool outputs.

One of them is wrk2, which provides it’s own stat output object, and HDRHistogram percentile distribution at the end of the run, but it’s more useful to reverse that and rebuild the actual Histogram object in Node so that I can call toJSON() (or whatever else) on it and have a standardized class/object.

The conundrum is this:

If you parse the output of hdr.outputPercentileDistribution(), and use that to rebuild the initial histogram, the final result is slightly different.

image https://www.diffchecker.com/8dwIT5h3

Demo of the issue here (just open and it should print to console pane on righthand side): https://codesandbox.io/s/frosty-night-0noxy?file=/src/index.ts:5627-6019

interface ParsedHDRHistogramSummary {
  buckets: number
  count: number
  max: number
  mean: number
  stddev: number
  sub_buckets: number
}

interface ParsedHDRHistogramValue {
  value: number
  percentile: number
  total_count: number
  of_one_percentile: number
}

interface ParsedHDRHistogram {
  summary: ParsedHDRHistogramSummary
  values: ParsedHDRHistogramValue[]
}

function convertPropertiesTo(type, obj) {
  for (let k in obj) obj[k] = type(obj[k])
  return obj
}

function parseHdrHistogram(text: string): ParsedHDRHistogram {
  let valuesRegex = new RegExp(
    /(?<value>\d+\.?\d*)[ ]+(?<percentile>\d+\.?\d*)[ ]+(?<total_count>\d+\.?\d*)([ ]+(?<of_one_percentile>\d+\.?\d*))?/g
  )

  // prettier-ignore
  let summaryRegex = new RegExp(
    /#\[Mean    =       (?<mean>\d+\.?\d*), StdDeviation   =        (?<stddev>\d+\.?\d*)]/.source + "\n" +
    /#\[Max     =       (?<max>\d+\.?\d*), Total count    =         (?<count>\d+\.?\d*)]/.source + "\n" +
    /#\[Buckets =            (?<buckets>\d+\.?\d*), SubBuckets     =         (?<sub_buckets>\d+\.?\d*)]/.source,
    'g'
  )

  // prettier-ignore
  const values: ParsedHDRHistogramValue[] = [...text.matchAll(valuesRegex)]
    .flatMap((it) => convertPropertiesTo(Number, it.groups as any))

  const summary: ParsedHDRHistogramSummary = [...text.matchAll(summaryRegex)]
    .flatMap((it) => convertPropertiesTo(Number, it.groups as any))
    .pop()

  return { summary, values }
}

// Calculates the amount of values per point in the percentile output 
function calculateHistogramIntervalCounts(values: ParsedHDRHistogramValue[]) {
  type HistogramPoint = { amount: number; value: number }
  let res: HistogramPoint[] = []

  let lastCount = 0
  for (let entry of values) {
    let amount = entry.total_count - lastCount
    let value = Math.round(entry.value)
    res.push({ amount, value })
    lastCount = entry.total_count
  }

  return res
}

function reconstructHdrHistogramFromParsed(parsedHistogram: ParsedHDRHistogram) {
  const histogram = hdr.build()
  const intervals = calculateHistogramIntervalCounts(parsedHistogram.values)
  for (let entry of intervals)
    histogram.recordValueWithCount(entry.value, entry.amount)
  return histogram
}

const reconstructHdrHistogramFromText = (text: string) =>
  reconstructHdrHistogramFromParsed(parseHdrHistogram(text))

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
alexvictoorcommented, Aug 2, 2020

Thanks for your kind words! I guess the 1/(1-percentile) is useful to plot the data, it will be used as the ‘x’ in the graph. This is not obvious looking at the code but if you want you can take a look here

1reaction
alexvictoorcommented, Jul 31, 2020

After thinking a little bit about how the output is built, this is not surprising that after parsing you get a good distribution and a not so good summary. This is because in the output you have less information than in the whole histogram. If you take a look at this code fragment :

const h = build();
for (let index = 0; index < 500; index++) {
      h.recordValue(index);
}

console.log(h.outputPercentileDistribution());

You get the following output:

  Value     Percentile TotalCount 1/(1-Percentile)
   
  0.000 0.000000000000          1           1.00
 49.000 0.100000000000         50           1.11
 99.000 0.200000000000        100           1.25

...

496.000 0.993750000000        497         160.00
497.000 0.994531250000        498         182.86
497.000 0.995312500000        498         213.33
498.000 0.996093750000        499         256.00
498.000 0.996484375000        499         284.44
498.000 0.996875000000        499         320.00
498.000 0.997265625000        499         365.71
498.000 0.997656250000        499         426.67
499.000 0.998046875000        500         512.00
499.000 1.000000000000        500
[Info]        #[Mean    =      249.500, StdDeviation   =      144.337]
[Info]        #[Max     =      499.000, Total count    =          500]
[Info]        #[Buckets =            1, SubBuckets     =         2048]

Looking at this output there is no way to say that there was one record for each value between 0 and 500. If you take a look at the first lines, you can say that there were 50 records for value 49 or less, that is it…

Since HdrHistogram_C has a hdr_encode_compressed() function, I guess a solution would be to ask @giltene if it would be a good think to add an option in wrk2 to get the results as a base64 compressed string.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Lesson 4: Describing Quantitative Data (Spread)
Step 3 of this process is “Describe the data.” You have already learned about the mean, median, mode, standard deviation, variance and histograms....
Read more >
Percentiles and Histograms - YouTube
How to find the 20th and 80th percentile of a data set · Answering Questions About Histograms · Subset|split|stack worksheet in minitab| ...
Read more >
Find Percentile with Mean and Standard Deviation (Normal ...
We go over how to find percentiles with mean and standard deviation for a normal distribution, using a calculator ( similar to TI-83)...
Read more >
How to Calculate the 5-Number Summary for Your Data in ...
In this tutorial, you will discover the five-number summary for describing the distribution of a data sample without assuming a specific data ...
Read more >
Histograms and summaries - Prometheus.io
You can use both summaries and histograms to calculate so-called φ-quantiles, where 0 ≤ φ ≤ 1. The φ-quantile is the observation value...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found