Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problems when classifying after finetuning BERT (Multi-Label)

See original GitHub issue

I am following the write-up to a muti-label classification as done here https://towardsdatascience.com/multi-label-classification-using-bert-roberta-xlnet-xlm-and-distilbert-with-simple-transformers-b3e0cda12ce5

I am having some difficulties. I loaded a Dutch base BERT model (from here https://github.com/wietsedv/bertje) and then I train a multi-label model with 50 labels:

import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv("all_data_withid.csv", encoding="utf8", delimiter=";")
df['labels'] = list(zip(df.label1.tolist(), df.label2.tolist(), ...)) #truncated for brevity
train_df, eval_df = train_test_split(df, test_size=0.3, random_state=123456)

model = MultiLabelClassificationModel('bert', 'bert-base-dutch-cased/bertje-base', num_labels=50, args={'train_batch_size':2, 'gradient_accumulation_steps':16, 'learning_rate': 3e-5, 'num_train_epochs': 1, 'max_seq_length': 512, 'fp16': False})

result, model_outputs, wrong_predictions = model.eval_model(validation_df)

Now the end result is that I get an LRAP score of roughly 0.71. However, now I am a bit puzzled on how to use this model to classify a single new instance. I closed Python, opened it again and loaded my trained model from disk:

model = MultiLabelClassificationModel('bert', 'outputs', num_labels=50, args={'train_batch_size':2, 'gradient_accumulation_steps':16, 'learning_rate': 3e-5, 'num_train_epochs': 1, 'max_seq_length': 512, 'fp16': False}).

I then tried model.predict(["dit is een test"]) and model.predict(["en nog een compleet andere test"]) and as it turns out the resulting outputs and predictions (always all 0s for every class) for these 2 distinct sentences are exactly the same on all values. I also tried to evaluate (result, model_outputs, wrong_predictions = model.eval_model(validation_df)) 3 times on different splits of my dataset but in all scenarios the resulting LRAP is the same ~0.71.

What am I doing wrong here?

Issue Analytics

State:
Created 4 years ago
Comments:67 (23 by maintainers)

Top GitHub Comments

4reactions

ThilinaRajapaksecommented, Jun 5, 2020

Lowering the learning rate and/or the number of training epochs seems to be the best solution to prevent the model from breaking completely and predicting the same class.

2reactions

flozi00commented, Apr 8, 2020

Same problem here, accuracy of 98% but in prediction only getting 0 for all labels. Tried Albert, Roberta, Bert, distilbert

Edit: Problem solved after completely reinstalling and rebooting