question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Can't deal with text label (invalid literal for int() with base 10: 'my_label1')

See original GitHub issue

I wanna do a multi-class classification task Here is my sample data (updated, thanks to @bentrevett )

text, label
My string 1, my_label1
My string 2, my_label2
My string 3, my_label3
My string 4, my_label3
My string 5, my_label4
...

Sample code

TEXT = data.Field(sequential=True, tokenize=word_tokenize, lower=True, fix_length=None)
LABEL = data.Field(sequential=False, use_vocab=False, unk_token=None)

train, valid, test = data.TabularDataset.splits(
    path=path , train='train.csv', validation='valid.csv', test='test.csv',
    skip_header=True, format='csv',
    fields=[('text', TEXT), ('label', LABEL)])

# Building vocabulary
TEXT.build_vocab(train, valid, test, max_size=10000, 
                 vectors='glove.6B.300d',  
                 unk_init=torch.nn.init.xavier_uniform_)
LABEL.build_vocab(train, valid, test)
vocab = TEXT.vocab

iter_train, iter_valid = data.BucketIterator.splits((train, valid), batch_size=64, device=device, sort_key=lambda x: len(x.text), sort_within_batch=False, repeat=False)
iter_test = data.Iterator(test, batch_size=64, train=False, device=device, sort=False,  sort_within_batch=False, repeat=False)

When I call

batch = next(iter(iter_valid))
batch.text

It will pop out error invalid literal for int() with base 10: 'my_label1' Can anyone give me any hint?
Google for whole day but still can’t solve it…

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:3
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

7reactions
bentrevettcommented, Nov 22, 2018

You have set use_vocab to False for your LABEL field, this is incorrect.

From: https://github.com/pytorch/text/blob/master/torchtext/data/field.py#L80 You should only set use_vocab = False when your labels are already integer values. TorchText is trying to convert “my_label1” into an integer, which it can’t, hence it throws this error.

Also, your fields argument to data.TabularDataset.splits are the wrong way around. In your example dataset your label is first, whereas you’ve got TEXT first in your fields argument. TorchText doesn’t do any matching of strings to headers in .csv files.

0reactions
zhangguanheng66commented, May 30, 2019

Feel free to re-open the issue if you still have any questions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ValueError: invalid literal for int() with base 10: '' - Stack Overflow
Use h.next() instead of next(h) prior to Python 2.6. The reason you had ValueError is because int cannot convert an empty string to...
Read more >
Python ValueError: invalid literal for int() with base 10 Solution
Our error message tells us there is an invalid literal for an integer in base 10. This means the value we have passed...
Read more >
ValueError: invalid literal for int() with base 10
The error message invalid literal for int() with base 10 would seem to indicate that you are passing a string that's not an...
Read more >
How to fix this ValueError invalid literal for int with base 10 ...
You can solve this error by using Python isdigit() method to check whether the value is number or not. The returns True if...
Read more >
Python ValueError: invalid literal for int() with base 10
This error can frequently occur when converting user-input to an integer-type using the int() function. This problem happens because Python stores the input...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found