Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

arr = [[self.vocab.stoi[x] for x in ex] for ex in arr] KeyError: None

See original GitHub issue

🐛 Bug

Describe the bug I came across this error when using data.Field. It only happen when I define my own unk_token and set min_freq >1 at the same time.

To Reproduce the code I use: `SRC = data.Field(lower=True, unk_token=“my_unk_token”) TGT = data.Field(lower=True)

train, val, test = datasets.IWSLT.splits(exts=(‘.de’, ‘.en’), fields=(SRC, TGT))

SRC.build_vocab(train, min_freq=10)

train_iter = data.BucketIterator(dataset=train, batch_size=64, sort_key=lambda x: data.interleave_keys(len(x.src), len(x.trg)))

batch = next(iter(train_iter))`

Issue Analytics

State:
Created 4 years ago
Comments:13 (6 by maintainers)

Top GitHub Comments

2reactions

mttkcommented, Feb 21, 2020

Can you try setting use_vocab = False in the Fields where you use the HF tokenizer?

Right now, you use the HF tokenizer to convert tokens to IDs, and torchtext Fields by default construct a vocabulary (and expect strings as keys). You don’t need a vocab because you’re already using the pretrained one from HF so you can just disable it in torchtext.

1reaction

mttkcommented, Nov 18, 2019

From what I see, you have token-wise labels (the POS tags). Since the LabelField assumes there is no tokenization (a single label), it treats this "('DET', 'NOUN', 'ADP', 'DET', 'NOUN', 'NOUN', 'ADP', 'DET', 'NOUN', 'NOUN', 'VERB', 'ADJ', 'NOUN')" as a single string. Since that exact sequence of POS tags wasn’t seen in the training data, and LabelFields don’t use unk tokens, this error occurs.

To fix this, for every output Field that contains sequential data (in this case, I assume that is the TAG_LABEL), instead define it by

def my_tokenize_function(string):
    # complete function to tokenize a line of POS_LABEL data
    # If I see correctly, this is 1. strip brackets; 2. comma split 3. strip `
    pass

POS_LABEL = data.Field(unk_token=None, tokenize=my_tokenize_function, is_target=True)

Top Results From Across the Web

Key Error 'None' arr = [self.vocab.stoi[x] for x in arr]

Hello all, I am trying to classify 'Greetings' in a sentence, so if a sentence has some kind of greeting it labelled as...

Arr = [self.vocab.stoi[x] for x in arr] KeyError: ' were there any ...

I'm following this tutorial: Build Your First Text Classification model using PyTorch. but I'm facing this error Traceback (most recent call ...

arr = [[self.vocab.stoi[x] for x in ex] for ex in arr] KeyError: None

I came across this error when using data.Field. It only happen when I define my own unk_token and set min_freq >1 at the...

Search i-need-need-help-with-one-question-for-my-math-it-is ...

Top Questions · 1.I need help for my engineering economy lesson the homework explanation is writing under in the document · 2.Need a...

Sentiment analysis with TFLearn - | notebook.community

Sentiment analysis with TFLearn. In this notebook, we'll continue Andrew Trask's work by building a network for sentiment analysis on the movie review...