Weighted probability
See original GitHub issueHi. Thank you for the Chatito!
Our team uses Chatito pretty extensively recently, and we like it.
What I noticed is that it is hard for newcomers to get a sense of what chance of appearing in the training data each sentence will get. E. g.:
%[some_intent]
~[alias1]
~[alias2] ~[alias3]
Here we can’t say what would be the distribution of the examples.
Most likely ~[alias2] ~[alias3]
will get a bigger share if the amount of variations in the aliases is about the same.
But if alias1
has much more variations than alias2
* alias3
, then ~[alias1]
will get a bigger share.
So we have to look up the aliases and go down the nested aliases tree to understand how many variations each sentence might get. This is rather error prone and time-consuming.
So we started to use even distribution a lot and specify probability with probability operator where needed (BTW, it is not that easy to get the total of probabilities to 100, again we have to calculate it and re-adjust).
It helped. But I was thinking, how we can improve that?
What if we create some flag for the generation command (e. g. --probability=weighted
)
If this flag is set, all the sentences will get the same weight of 1, which can be modified with the probability operator.
e.g.
// here we have 50%/50% probability for the first and second sentence
%[some_intent]
~[alias1]
~[alias2] ~[alias3]
// here we have 2:1 ratio of first to second sentence. I. e. 66.66% for the first and 33.33% for the second
%[some_other_intent]
*[2] ~[alias1]
~[alias2] ~[alias3]
I suppose the weighted probability might be even easier to grok because *[2]
means you want the amount of this kind of examples to be doubled.
So with the “weighted probability” we won’t have to set even distribution everywhere and it’s easier to modify weights.
What do you think about it? Could this be a valuable addition to Chatito? I’d like to work on a PR for that.
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (10 by maintainers)
I just read the updated
spec.md
. It looks really good! So, here is what I think we will need:defaultDistribution
cli argument.distribution
entity argument (if set) anddefaultDistribution
configuration.defaultDistribution
to the web editorI feel like I can do 3 and 4. But I’m open to any suggestions.
Published 2.3.0. It was great sharing the work on this Yuri, thanks.