Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multiclass evaluation not working

See original GitHub issue

Hello,

I am new to Transformers library and I’m trying to do Sequence Classification. I have 24 labels and I am getting the following error:

ValueError: Target is multiclass but average=‘binary’. Please choose another average setting, one of [None, ‘micro’, ‘macro’, ‘weighted’].

Even though I already added the averaging method as such:

metric = load_metric('precision', average='weighted')

Would someone kindly point me towards whatever I’m doing wrong? I’m able to fine-tune the pre-trained BERT model if I use ‘accuracy’ as the metric. But somehow it’s not accepting my average='weighted' argument.

Issue Analytics

State:
Created a year ago
Comments:5 (1 by maintainers)

Top GitHub Comments

4reactions

dvenicommented, Apr 12, 2022

Hi there!

You are passing the average argument when you load the metric, instead, you should pass it to the compute method like this:

metric = load_metric('precision')
metric.compute(predictions=[0,1,2,3,4,4,4,4], references=[2,2,2,3,4,1,1,4], average="weighted")

Output:
----------------------
>>> {'precision': 0.625}

Hope this helps!

2reactions

puifaiscommented, Apr 7, 2022

No problem. Sorry, I wasn’t sure if this was a bug or my own mistake so I didn’t use the bug template. Here you go:

Environment info

transformers version: 4.17.0
Platform: Linux-4.14.252-131.483.amzn1.x86_64-x86_64-with-glibc2.9
Python version: 3.6.13
PyTorch version (GPU?): 1.10.2+cu102 (False)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help

Information

Model I am using (Bert, XLNet …): bert-based-cased

The problem arises when using:

the official example scripts: (give details below)
my own modified scripts: (give details below)

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name)
my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

Load own data set using Dataset.from_pandas(my_data)
Tokenize it with AutoTokenizer.from_pretrained('bert-base-cased')
Create training arguments, metric, and Trainer object to start training.

model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=24)
training_args = TrainingArguments(output_dir='./checkpoints/my_model', evaluation_strategy="epoch")

metric = load_metric('precision', average='weighted')
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

trainer = Trainer(model=model,
                  args=training_args,
                  train_dataset=train_dataset,
                  eval_dataset=eval_dataset,
                  compute_metrics=compute_metrics)
trainer.train()

and I got this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-46-5aef28bcb00d> in <module>
     14                   eval_dataset=eval_dataset,
     15                   compute_metrics=compute_metrics)
---> 16 trainer.train()

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1488 
   1489             self.control = self.callback_handler.on_epoch_end(args, self.state, self.control)
-> 1490             self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
   1491 
   1492             if DebugOption.TPU_METRICS_DEBUG in self.args.debug:

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/transformers/trainer.py in _maybe_log_save_evaluate(self, tr_loss, model, trial, epoch, ignore_keys_for_eval)
   1600         metrics = None
   1601         if self.control.should_evaluate:
-> 1602             metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
   1603             self._report_to_hp_search(trial, epoch, metrics)
   1604 

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/transformers/trainer.py in evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
   2262             prediction_loss_only=True if self.compute_metrics is None else None,
   2263             ignore_keys=ignore_keys,
-> 2264             metric_key_prefix=metric_key_prefix,
   2265         )
   2266 

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/transformers/trainer.py in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
   2503         # Metrics!
   2504         if self.compute_metrics is not None and all_preds is not None and all_labels is not None:
-> 2505             metrics = self.compute_metrics(EvalPrediction(predictions=all_preds, label_ids=all_labels))
   2506         else:
   2507             metrics = {}

<ipython-input-46-5aef28bcb00d> in compute_metrics(eval_pred)
      7     logits, labels = eval_pred
      8     predictions = np.argmax(logits, axis=-1)
----> 9     return metric.compute(predictions=predictions, references=labels)
     10 
     11 trainer = Trainer(model=model,

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/datasets/metric.py in compute(self, predictions, references, **kwargs)
    428             inputs = {input_name: self.data[input_name] for input_name in self.features}
    429             with temp_seed(self.seed):
--> 430                 output = self._compute(**inputs, **compute_kwargs)
    431 
    432             if self.buf_writer is not None:

~/.cache/huggingface/modules/datasets_modules/metrics/precision/bfadb1cf35fe89242263de7dc028b248827c08ba075659c0e812d0fc6e5237c9/precision.py in _compute(self, predictions, references, labels, pos_label, average, sample_weight)
    116     def _compute(self, predictions, references, labels=None, pos_label=1, average="binary", sample_weight=None):
    117         score = precision_score(
--> 118             references, predictions, labels=labels, pos_label=pos_label, average=average, sample_weight=sample_weight
    119         )
    120         return {"precision": float(score) if score.size == 1 else score}

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/sklearn/metrics/_classification.py in precision_score(y_true, y_pred, labels, pos_label, average, sample_weight, zero_division)
   1660                                                  warn_for=('precision',),
   1661                                                  sample_weight=sample_weight,
-> 1662                                                  zero_division=zero_division)
   1663     return p
   1664 

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/sklearn/metrics/_classification.py in precision_recall_fscore_support(y_true, y_pred, beta, labels, pos_label, average, warn_for, sample_weight, zero_division)
   1463         raise ValueError("beta should be >=0 in the F-beta score")
   1464     labels = _check_set_wise_labels(y_true, y_pred, average, labels,
-> 1465                                     pos_label)
   1466 
   1467     # Calculate tp_sum, pred_sum, true_sum ###

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/sklearn/metrics/_classification.py in _check_set_wise_labels(y_true, y_pred, average, labels, pos_label)
   1294             raise ValueError("Target is %s but average='binary'. Please "
   1295                              "choose another average setting, one of %r."
-> 1296                              % (y_type, average_options))
   1297     elif pos_label not in (None, 1):
   1298         warnings.warn("Note that pos_label (set to %r) is ignored when "

ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

Expected behavior

I expect the model to complete training without an error. I am able to do this if I used metric = load_metric('accuracy') but not with precision, recall, or f1.