Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to read alignment graph?

See original GitHub issue

Trying to understand what the axes and color legend means

From reading other issues, I understand that a diagonal line means good alignment, but what exactly is happening at each x,y value and color of the graph?
Is the color legend the alignment value, and higher means better?

Issue Analytics

State:
Created 5 years ago
Reactions:6
Comments:10

Top GitHub Comments

12reactions

NTT123commented, Apr 12, 2018

There are two important things here.

(1) the encoder (y-axis) which at each step takes an input character and its current state, to output a real vector representing the status of the brain at that moment. The input length here is about 100, so there are about 100 vectors generated by the encoder.

(2) the decoder (x-axis) takes all the vectors (the y-axis) to generate audio frames (mel-spectrogram). The decoder also works step-by-step, at each step (there are about 80-90 steps here) it would decide which vectors (on the y-axis) are important to create audio frames at that particular moment. Bright colors mean to focus more here (on the y-axis) and vice versa.

In short, the encoder reads input characters step-by-step and outputs status vectors. The decoder reads all status vectors and generates audio frames step-by-step.

A good alignment simply means: An “A” sound generated by the decoder should be the result of focusing on the vector generated by the encoder from reading “A” character. The diagonal line is the result when audio frames are created by focusing on the correct input characters in order.

10reactions

NTT123commented, Apr 14, 2018

@reallynotabot,

(1) the number of decoding steps is determined by the training audio sample. For example, each decoding step generates, say, 5 frames, if an audio sample is about 400 audio frames, then there are 400/5 = 80 decoding steps.

(2) the decoder has to learn which vectors are important. This is the training. Technically, this is an attention mechanism, see here for details.

(3) At each decoding step, the whole y-axis is a weighted sum. All the colors add up to 1.0. The focused vector is actually the weighted average of all encoder’s status vectors.

Yes, vector 60 adds very little to the average vector (because the weight is very small) at the 20 decoding step.

Top Results From Across the Web

How To Read A Wheel Alignment Report - Sun Auto Service

Many alignment reports are printed in full color to help the reader identify which angle is too positive or too negative. Just like...

How to read alignment specifications - QuickTrick Alignment

This chart contains different methods manufacturers use for displaying wheel alignment angles. The specifications in this chart are for the ...

GraphAligner: rapid and versatile sequence-to-graph alignment

Aligning reads to a de Bruijn graph (DBG) is a method of error correcting long reads from short reads [6, 7]. The idea...

Understanding wheel alignment - Team-BHP

Understanding wheel alignment · Angle readings are measured at all four wheels. · Rear wheels are set to specification. (Rear thrust line ...

Alignment Representation (Graph) - SeqAn - Read the Docs

Construct a multiple sequence alignment using the Alignment Graph data structure. Use the three sequences GARFIELDTHECAT , GARFIELDTHEBIGCAT and THEBIGCAT and ...