Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Weighted probability

See original GitHub issue

Hi. Thank you for the Chatito!

Our team uses Chatito pretty extensively recently, and we like it.

What I noticed is that it is hard for newcomers to get a sense of what chance of appearing in the training data each sentence will get. E. g.:

%[some_intent]
    ~[alias1]
    ~[alias2] ~[alias3]

Here we can’t say what would be the distribution of the examples. Most likely ~[alias2] ~[alias3] will get a bigger share if the amount of variations in the aliases is about the same. But if alias1 has much more variations than alias2 * alias3, then ~[alias1] will get a bigger share. So we have to look up the aliases and go down the nested aliases tree to understand how many variations each sentence might get. This is rather error prone and time-consuming.

So we started to use even distribution a lot and specify probability with probability operator where needed (BTW, it is not that easy to get the total of probabilities to 100, again we have to calculate it and re-adjust).

It helped. But I was thinking, how we can improve that? What if we create some flag for the generation command (e. g. --probability=weighted) If this flag is set, all the sentences will get the same weight of 1, which can be modified with the probability operator. e.g.

// here we have 50%/50% probability for the first and second sentence
%[some_intent]
    ~[alias1]
    ~[alias2] ~[alias3]

// here we have 2:1 ratio of first to second sentence. I. e. 66.66% for the first and 33.33% for the second
%[some_other_intent]
    *[2] ~[alias1]
    ~[alias2] ~[alias3]

I suppose the weighted probability might be even easier to grok because *[2] means you want the amount of this kind of examples to be doubled. So with the “weighted probability” we won’t have to set even distribution everywhere and it’s easier to modify weights.

What do you think about it? Could this be a valuable addition to Chatito? I’d like to work on a PR for that.

Issue Analytics

State:
Created 4 years ago
Comments:10 (10 by maintainers)

Top GitHub Comments

1reaction

nimfcommented, Jun 21, 2019

I just read the updated spec.md. It looks really good! So, here is what I think we will need:

Add parsing of “%” inside the probability operator.
Allow alias definitions to have entity arguments.
Implement the defaultDistribution cli argument.
Update the calculation of the weights considering the distribution entity argument (if set) and defaultDistribution configuration.
Expose defaultDistribution to the web editor

I feel like I can do 3 and 4. But I’m open to any suggestions.

0reactions

rodrigopivicommented, Jun 26, 2019

Published 2.3.0. It was great sharing the work on this Yuri, thanks.

Top Results From Across the Web

How to Calculate Weighted Probabilities

Divide the number of ways to achieve the desired outcome by the number of total possible outcomes to calculate the weighted probability.

Probability Weighting

π is the probability weighting function. It takes the true objective probabilities and warps them into what are sometimes called decision weights For...

Excel formula: Random number weighted probability

To generated a random number, weighted with a given probability, you can use a helper table together with a formula based on the...

7. Weighted Probabilities | Numerical Programming

Python Tutorial on weighted random Choice and Sample. ... In the previous chapter on random numbers and probability, we introduced the ...

1. How different weighting methods work

A key concept in probability-based sampling is that if survey respondents have different probabilities of selection, weighting each case by ...