Annotating an unlabeled set
See original GitHub issueHi. Thanks for the great repo. I have got a question regarding the PET training and annotating an unlabeled set (as mentioned in the paper examples from D). I assume that it would be done using the command in the PET Training and Evaluation
section in the repo. However, I am not sure where to put the unlabeled set and where to get the predicted labels? Would you please let me know how we should get the predicted labels for the unlabeled set? Thank you.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:6 (3 by maintainers)
Top Results From Across the Web
How to Use Unlabeled Data in Machine Learning
These models use unlabeled data with only certain data points annotated. This is very useful for self-training and co-training, which can be ...
Read more >Data Annotation Tutorial: Definition, Tools, Datasets - V7 Labs
Image annotation is the task of annotating an image with labels. It ensures that a machine learning algorithm recognizes an annotated area as...
Read more >How to Annotate and Improve Datasets with CVAT and FiftyOne
Unlabeled dataset annotation. For most machine learning projects, the first step is to collect a dataset needed for a specific task. For ...
Read more >Annotating Datasets — FiftyOne 0.18.0 documentation - Voxel51
The basic workflow to use the annotation API to add or edit labels on your FiftyOne datasets is as follows: Load a labeled...
Read more >Automatic Annotation of Unlabeled Data from Smartphone ...
In this paper, we adopted the k-means clustering algorithm for annotating unlabeled sensor data for the purpose of detecting sensitive ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@timoschick
if I have labels
0 = 'bad'
&1 = 'good'
I’ll get anunlabeled_logits.txt
with the first row beeing -1 and then a row for each row in my unlabeled.csv file.Is it correct that I then apply softmax to it to get a prediction of the first label “bad” (corresponds to first “column” in logits file) and “good” (second “column”)
example logits
EDIT:
Ended up writing a conversion script (since I’m using an airflow pipeline anyways for the job) that writes me a prediction file with probabilities from the logits
output is a propability for my label bad (first column) and good (2nd)
If your verbalizer uses only the words
terrible
,bad
,okay
,good
andgreat
, then PET simply ignores the probabilities assigned to all other words. Let’s assume the model’s predictions are (in that order):PET basically removes all words that are not used by the verbalizer, resulting in the following reduced list:
So PET would assign the label corresponding to
terrible
to this example, even ifterrible
is not the word that the language model would have predicted.