question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Output sequence tagger has wrong probabilities output

See original GitHub issue

Describe the bug In Ludwig 0.1.x and 0.2.x when having an output feature using a tagger (like in the https://ludwig-ai.github.io/ludwig-docs/examples/#natural-language-understanding example), all the various slots were correctly tagged, and for each slot you were presented with an array of confidence values, representing each possible slot tag confidence. In 0.4.0 (and also happens in 0.3.x) you are presented with a fixed array of confidences, with length max_sequence_length, with one value per slot.

I would expect having multiple arrays of confidences for each output slot as in previous versions, or, if this is an intended change, I would expect this array length to match the output tags length (in the case you want to only output the maximum confidence value for each slot, which I don’t think is the case).

To Reproduce Train a model like presented in https://ludwig-ai.github.io/ludwig-docs/examples/#natural-language-understanding Run inference on an utterance Observe the prediction field slots_predictions and note its length Observe the prediction field slots_probabilities and note its length (always set to the maximum possible output value, and with a strange default value for non used tags, without all the sub-arrays representing each slot confidence)

Python 3.6 and 3.9 Ludwig 0.3.x, 0.4.0, git+master current

please note: Ludwig 0.3.3 has a different slots_probabilities output format. In 0.4.0 strange "\n"s were present in the array (like it was a text being parsed), while in 0.3.3 it is correctly parsed as an array.

One more thing: slots_probability which should be the overall confidence of all the slots being predicted, is often negative-valued (which should never happen).

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
carlogrisetticommented, May 3, 2022

It took a little bit to get around to seeing this, but here I am 😃 This is the slots_probabilities array I get when parsing the utterance it is a nice day: [0.99996495, 0.9998147 , 0.99985003, 0.9999442 , 0.99750584,\n 0.38254768, 0.38254768, 0.38254768, 0.38254768, 0.38254768,\n 0.38254768, 0.38254768, 0.38254768, 0.38254768, 0.38254768,\n 0.38254768, 0.38254768, 0.38254768, 0.382547] I expected to have an array that displayed only 5 values, one for each slot, outputting each slot probability.

Also, this is the slots_probability for the same utterance: -0.0029234159737825394 I for sure did not expect a negative value here.

I’ll look some more into this since it’s now a thing I need to quickly fix and use, and hopefully I’ll find the culprit 😃 Any help is much appreciated!

0reactions
dalianaliucommented, Jul 28, 2022

Hi @carlogrisetti, has the update to 0.5 worked for this issue?

Read more comments on GitHub >

github_iconTop Results From Across the Web

CS440/ECE448 Assignment 4
Your tagger should use the training data to estimate the probabilities it requires, and then use this model to infer tags for the...
Read more >
Sequence Prediction and Part-of-speech Tagging
– Probabilities are product of locally normalized probabilities. – Is this bad? • Label bias. – MEMM taggers' local scores can be near...
Read more >
Sequence Labeling for Parts of Speech and Named Entities
In this chapter we'll introduce the task of part-of-speech tagging, ... label yi, so that the output sequence Y has the same length...
Read more >
Part of Speech (POS) tagging with Hidden Markov Model
POS or Part of Speech tagging is a task of labeling each word in a sentence with ... of hidden states—called the Viterbi...
Read more >
Lecture 10: Algorithms for HMMs
POS tagging is a sequence labelling task. ... i.e. not considering tag context, gives the wrong ... Output probabilities from each state P(wi....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found