Problems when classifying after finetuning BERT (Multi-Label)
See original GitHub issueI am following the write-up to a muti-label classification as done here https://towardsdatascience.com/multi-label-classification-using-bert-roberta-xlnet-xlm-and-distilbert-with-simple-transformers-b3e0cda12ce5
I am having some difficulties. I loaded a Dutch base BERT model (from here https://github.com/wietsedv/bertje) and then I train a multi-label model with 50 labels:
import pandas as pd
from sklearn.model_selection import train_test_split
df = pd.read_csv("all_data_withid.csv", encoding="utf8", delimiter=";")
df['labels'] = list(zip(df.label1.tolist(), df.label2.tolist(), ...)) #truncated for brevity
train_df, eval_df = train_test_split(df, test_size=0.3, random_state=123456)
model = MultiLabelClassificationModel('bert', 'bert-base-dutch-cased/bertje-base', num_labels=50, args={'train_batch_size':2, 'gradient_accumulation_steps':16, 'learning_rate': 3e-5, 'num_train_epochs': 1, 'max_seq_length': 512, 'fp16': False})
result, model_outputs, wrong_predictions = model.eval_model(validation_df)
Now the end result is that I get an LRAP score of roughly 0.71. However, now I am a bit puzzled on how to use this model to classify a single new instance. I closed Python, opened it again and loaded my trained model from disk:
model = MultiLabelClassificationModel('bert', 'outputs', num_labels=50, args={'train_batch_size':2, 'gradient_accumulation_steps':16, 'learning_rate': 3e-5, 'num_train_epochs': 1, 'max_seq_length': 512, 'fp16': False})
.
I then tried model.predict(["dit is een test"])
and model.predict(["en nog een compleet andere test"])
and as it turns out the resulting outputs and predictions (always all 0s for every class) for these 2 distinct sentences are exactly the same on all values. I also tried to evaluate (result, model_outputs, wrong_predictions = model.eval_model(validation_df)
) 3 times on different splits of my dataset but in all scenarios the resulting LRAP is the same ~0.71.
What am I doing wrong here?
Issue Analytics
- State:
- Created 4 years ago
- Comments:67 (23 by maintainers)
Top GitHub Comments
Lowering the learning rate and/or the number of training epochs seems to be the best solution to prevent the model from breaking completely and predicting the same class.
Same problem here, accuracy of 98% but in prediction only getting 0 for all labels. Tried Albert, Roberta, Bert, distilbert
Edit: Problem solved after completely reinstalling and rebooting