question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Weighted probability

See original GitHub issue

Hi. Thank you for the Chatito!

Our team uses Chatito pretty extensively recently, and we like it.

What I noticed is that it is hard for newcomers to get a sense of what chance of appearing in the training data each sentence will get. E. g.:

%[some_intent]
    ~[alias1]
    ~[alias2] ~[alias3]

Here we can’t say what would be the distribution of the examples. Most likely ~[alias2] ~[alias3] will get a bigger share if the amount of variations in the aliases is about the same. But if alias1 has much more variations than alias2 * alias3, then ~[alias1] will get a bigger share. So we have to look up the aliases and go down the nested aliases tree to understand how many variations each sentence might get. This is rather error prone and time-consuming.

So we started to use even distribution a lot and specify probability with probability operator where needed (BTW, it is not that easy to get the total of probabilities to 100, again we have to calculate it and re-adjust).

It helped. But I was thinking, how we can improve that? What if we create some flag for the generation command (e. g. --probability=weighted) If this flag is set, all the sentences will get the same weight of 1, which can be modified with the probability operator. e.g.

// here we have 50%/50% probability for the first and second sentence
%[some_intent]
    ~[alias1]
    ~[alias2] ~[alias3]

// here we have 2:1 ratio of first to second sentence. I. e. 66.66% for the first and 33.33% for the second
%[some_other_intent]
    *[2] ~[alias1]
    ~[alias2] ~[alias3]

I suppose the weighted probability might be even easier to grok because *[2] means you want the amount of this kind of examples to be doubled. So with the “weighted probability” we won’t have to set even distribution everywhere and it’s easier to modify weights.

What do you think about it? Could this be a valuable addition to Chatito? I’d like to work on a PR for that.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
nimfcommented, Jun 21, 2019

I just read the updated spec.md. It looks really good! So, here is what I think we will need:

  1. Add parsing of “%” inside the probability operator.
  2. Allow alias definitions to have entity arguments.
  3. Implement the defaultDistribution cli argument.
  4. Update the calculation of the weights considering the distribution entity argument (if set) and defaultDistribution configuration.
  5. Expose defaultDistribution to the web editor

I feel like I can do 3 and 4. But I’m open to any suggestions.

0reactions
rodrigopivicommented, Jun 26, 2019

Published 2.3.0. It was great sharing the work on this Yuri, thanks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Calculate Weighted Probabilities
Divide the number of ways to achieve the desired outcome by the number of total possible outcomes to calculate the weighted probability.
Read more >
Probability Weighting
π is the probability weighting function. It takes the true objective probabilities and warps them into what are sometimes called decision weights For...
Read more >
Excel formula: Random number weighted probability
To generated a random number, weighted with a given probability, you can use a helper table together with a formula based on the...
Read more >
7. Weighted Probabilities | Numerical Programming
Python Tutorial on weighted random Choice and Sample. ... In the previous chapter on random numbers and probability, we introduced the ...
Read more >
1. How different weighting methods work
A key concept in probability-based sampling is that if survey respondents have different probabilities of selection, weighting each case by ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found