question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError: too many dimensions 'str' for Multilabel-classification

See original GitHub issue

Hi @ThilinaRajapakse

I run through your tutorial on Medium and found this bug. My dataFrame structure is like you described but has the number of labels 11 instead of 6. I also created two additional column called “labels” and “text” for the train_data DataFrame.

/work/vnhh/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
I1118 08:14:10.053511 140433564665600 file_utils.py:39] PyTorch version 1.3.0 available.
I1118 08:14:10.217245 140433564665600 modeling_xlnet.py:194] Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
I1118 08:14:10.847334 140433564665600 tokenization_utils.py:374] loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-vocab.json from cache at /work/vnhh/.cache/torch/transformers/d0c5776499adc1ded22493fae699da0971c1ee4c2587111707a4d177d20257a2.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
I1118 08:14:10.847580 140433564665600 tokenization_utils.py:374] loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-merges.txt from cache at /work/vnhh/.cache/torch/transformers/b35e7cd126cd4229a746b5d5c29a749e8e84438b14bcdb575950584fe33207e8.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
I1118 08:14:11.280796 140433564665600 configuration_utils.py:151] loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-config.json from cache at /work/vnhh/.cache/torch/transformers/e1a2a406b5a05063c31f4dfdee7608986ba7c6393f7f79db5e69dcd197208534.9dad9043216064080cf9dd3711c53c0f11fe2b09313eaa66931057b4bdcaf068
I1118 08:14:11.282683 140433564665600 configuration_utils.py:168] Model config {
  "attention_probs_dropout_prob": 0.1,
  "finetuning_task": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "num_labels": 11,
  "output_attentions": false,
  "output_hidden_states": false,
  "output_past": true,
  "pruned_heads": {},
  "torchscript": false,
  "type_vocab_size": 1,
  "use_bfloat16": false,
  "vocab_size": 50265
}

I1118 08:14:11.620726 140433564665600 modeling_utils.py:337] loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-pytorch_model.bin from cache at /work/vnhh/.cache/torch/transformers/228756ed15b6d200d7cb45aaef08c087e2706f54cb912863d2efe07c89584eb7.49b88ba7ec2c26a7558dda98ca3884c3b80fa31cf43a1b1f23aef3ff81ba344e
I1118 08:14:15.541236 140433564665600 modeling_utils.py:405] Weights of RobertaForMultiLabelSequenceClassification not initialized from pretrained model: ['classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
I1118 08:14:15.541777 140433564665600 modeling_utils.py:408] Weights from pretrained model not used in RobertaForMultiLabelSequenceClassification: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight']
Features loaded from cache at cache_dir/cached_train_roberta_512_binary
Traceback (most recent call last):
  File "nn.py", line 54, in <module>
    test_predictions = train_and_predict(train_data, test_data)
  File "nn.py", line 20, in train_and_predict
    model.train_model(train_data)
  File "/work/vnhh/anaconda3/lib/python3.6/site-packages/simpletransformers/classification/multi_label_classification_model.py", line 106, in train_model
    return super().train_model(train_df, multi_label=multi_label, output_dir=output_dir, show_running_loss=show_running_loss, args=args)
  File "/work/vnhh/anaconda3/lib/python3.6/site-packages/simpletransformers/classification/classification_model.py", line 173, in train_model
    train_dataset = self.load_and_cache_examples(train_examples)
  File "/work/vnhh/anaconda3/lib/python3.6/site-packages/simpletransformers/classification/multi_label_classification_model.py", line 115, in load_and_cache_examples
    return super().load_and_cache_examples(examples, evaluate=evaluate, no_cache=no_cache, multi_label=multi_label)
  File "/work/vnhh/anaconda3/lib/python3.6/site-packages/simpletransformers/classification/classification_model.py", line 458, in load_and_cache_examples
    all_label_ids = torch.tensor([f.label_id for f in features], dtype=torch.long)
ValueError: too many dimensions 'str'

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:13 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
ThilinaRajapaksecommented, Nov 18, 2019

The features are being loaded from cache. Cached features might not be in the correct format if you change the configuration. Set reprocess_input_data to True and see if it fixes the issue.

model = MultiLabelClassificationModel('roberta', 'roberta-base', args={'reprocess_input_data': True})
1reaction
ThilinaRajapaksecommented, Nov 29, 2019

There is, but any text longer than the maximum length will be truncated so it won’t cause any issues. Can you check the data type of the labels. Make sure they are all python lists and not strings. Saving and loading pandas DFs seems to convert them to strings.

Read more comments on GitHub >

github_iconTop Results From Across the Web

too many dimensions 'str' error occuring - Stack Overflow
So I suggest you to convert your string labels into integer values before passing it to the torch.tensor(). IMPLEMENTATION.
Read more >
ValueError: too many dimensions 'str' · Issue #229 - GitHub
Found the problem in my case, multi-class-multi-label classification is not currently supported. Even though the label codification was right...
Read more >
PyTorch ValueError: too many dimensions 'str'
I am trying to use torchvision.transforms to apply transformtation the training data, but getting the following traceback error: Traceback ...
Read more >
ValueError: too many dimensions 'str' - Hugging Face Forums
I am getting multiple error when I try to train my model using trainer. I can't figure out how to resolve this Value...
Read more >
Source code for pytorch_forecasting.models.base_model
Timeseries models share a number of common characteristics. This module implements these in a common base class. """ from collections import namedtuple ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found