arr = [[self.vocab.stoi[x] for x in ex] for ex in arr] KeyError: None
See original GitHub issueš Bug
Describe the bug I came across this error when using data.Field. It only happen when I define my own unk_token and set min_freq >1 at the same time.
To Reproduce the code I use: `SRC = data.Field(lower=True, unk_token=āmy_unk_tokenā) TGT = data.Field(lower=True)
train, val, test = datasets.IWSLT.splits(exts=(ā.deā, ā.enā), fields=(SRC, TGT))
SRC.build_vocab(train, min_freq=10)
train_iter = data.BucketIterator(dataset=train, batch_size=64, sort_key=lambda x: data.interleave_keys(len(x.src), len(x.trg)))
batch = next(iter(train_iter))`
Issue Analytics
- State:
- Created 4 years ago
- Comments:13 (6 by maintainers)
Top Results From Across the Web
Key Error 'None' arr = [self.vocab.stoi[x] for x in arr]
Hello all, I am trying to classify 'Greetings' in a sentence, so if a sentence has some kind of greeting it labelled as...
Read more >Arr = [self.vocab.stoi[x] for x in arr] KeyError: ' were there any ...
I'm following this tutorial: Build Your First Text Classification model using PyTorch. but I'm facing this error Traceback (most recent callĀ ...
Read more >arr = [[self.vocab.stoi[x] for x in ex] for ex in arr] KeyError: None
I came across this error when using data.Field. It only happen when I define my own unk_token and set min_freq >1 at the...
Read more >Search i-need-need-help-with-one-question-for-my-math-it-is ...
Top Questions Ā· 1.I need help for my engineering economy lesson the homework explanation is writing under in the document Ā· 2.Need a...
Read more >Sentiment analysis with TFLearn - | notebook.community
Sentiment analysis with TFLearn. In this notebook, we'll continue Andrew Trask's work by building a network for sentiment analysis on the movie review...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Can you try setting
use_vocab = False
in the Fields where you use the HF tokenizer?Right now, you use the HF tokenizer to convert tokens to IDs, and torchtext Fields by default construct a vocabulary (and expect strings as keys). You donāt need a vocab because youāre already using the pretrained one from HF so you can just disable it in torchtext.
From what I see, you have token-wise labels (the POS tags). Since the LabelField assumes there is no tokenization (a single label), it treats this
"('DET', 'NOUN', 'ADP', 'DET', 'NOUN', 'NOUN', 'ADP', 'DET', 'NOUN', 'NOUN', 'VERB', 'ADJ', 'NOUN')"
as a single string. Since that exact sequence of POS tags wasnāt seen in the training data, and LabelFields donāt use unk tokens, this error occurs.To fix this, for every output Field that contains sequential data (in this case, I assume that is the
TAG_LABEL
), instead define it by