question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItĀ collects links to all the places you might be looking at while hunting down a tough bug.

And, if youā€™re still stuck at the end, weā€™re happy to hop on a call to see how we can help out.

arr = [[self.vocab.stoi[x] for x in ex] for ex in arr] KeyError: None

See original GitHub issue

šŸ› Bug

Describe the bug I came across this error when using data.Field. It only happen when I define my own unk_token and set min_freq >1 at the same time.

To Reproduce the code I use: `SRC = data.Field(lower=True, unk_token=ā€œmy_unk_tokenā€) TGT = data.Field(lower=True)

train, val, test = datasets.IWSLT.splits(exts=(ā€˜.deā€™, ā€˜.enā€™), fields=(SRC, TGT))

SRC.build_vocab(train, min_freq=10)

train_iter = data.BucketIterator(dataset=train, batch_size=64, sort_key=lambda x: data.interleave_keys(len(x.src), len(x.trg)))

batch = next(iter(train_iter))`

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:13 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
mttkcommented, Feb 21, 2020

Can you try setting use_vocab = False in the Fields where you use the HF tokenizer?

Right now, you use the HF tokenizer to convert tokens to IDs, and torchtext Fields by default construct a vocabulary (and expect strings as keys). You donā€™t need a vocab because youā€™re already using the pretrained one from HF so you can just disable it in torchtext.

1reaction
mttkcommented, Nov 18, 2019

From what I see, you have token-wise labels (the POS tags). Since the LabelField assumes there is no tokenization (a single label), it treats this "('DET', 'NOUN', 'ADP', 'DET', 'NOUN', 'NOUN', 'ADP', 'DET', 'NOUN', 'NOUN', 'VERB', 'ADJ', 'NOUN')" as a single string. Since that exact sequence of POS tags wasnā€™t seen in the training data, and LabelFields donā€™t use unk tokens, this error occurs.

To fix this, for every output Field that contains sequential data (in this case, I assume that is the TAG_LABEL), instead define it by

def my_tokenize_function(string):
    # complete function to tokenize a line of POS_LABEL data
    # If I see correctly, this is 1. strip brackets; 2. comma split 3. strip `
    pass

POS_LABEL = data.Field(unk_token=None, tokenize=my_tokenize_function, is_target=True)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Key Error 'None' arr = [self.vocab.stoi[x] for x in arr]
Hello all, I am trying to classify 'Greetings' in a sentence, so if a sentence has some kind of greeting it labelled as...
Read more >
Arr = [self.vocab.stoi[x] for x in arr] KeyError: ' were there any ...
I'm following this tutorial: Build Your First Text Classification model using PyTorch. but I'm facing this error Traceback (most recent callĀ ...
Read more >
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr] KeyError: None
I came across this error when using data.Field. It only happen when I define my own unk_token and set min_freq >1 at the...
Read more >
Search i-need-need-help-with-one-question-for-my-math-it-is ...
Top Questions Ā· 1.I need help for my engineering economy lesson the homework explanation is writing under in the document Ā· 2.Need a...
Read more >
Sentiment analysis with TFLearn - | notebook.community
Sentiment analysis with TFLearn. In this notebook, we'll continue Andrew Trask's work by building a network for sentiment analysis on the movie review...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found