question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NLTK WordNet error with a word look up using synsets

See original GitHub issue

I am using Python 3.6 with NLTK 3.2.3, and I am getting a “WordNetError” only for the word “escort”. I don’t get errors with any other words. Here’s the transcript showing success with the word “dog” and the error using the word “escort”:

Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 12:22:00) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from nltk.corpus import wordnet
>>> wordnet.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
>>> wordnet.synsets('escort')
Traceback (most recent call last):
  File "/home/user1/.conda/envs/ca/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1403, in _synset_from_pos_and_line
    offset = int(_next_token())
ValueError: invalid literal for int() with base 10: '02026433\x00v'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user1/.conda/envs/ca/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1491, in synsets
    for p in pos
  File "/home/user1/.conda/envs/ca/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1493, in <listcomp>
    for offset in index[form].get(p, [])]
  File "/home/user1/.conda/envs/ca/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1335, in synset_from_pos_and_offset
    synset = self._synset_from_pos_and_line(pos, data_file_line)
  File "/home/user1/.conda/envs/ca/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1448, in _synset_from_pos_and_line
    raise WordNetError('line %r: %s' % (data_file_line, e))
nltk.corpus.reader.wordnet.WordNetError: line '02025829 38 v 01 escort 0 006 @ 02025550 v 0000 + 09992538 n 0102 ~ 02026203 v 0000 ~ 02026327 v 0000 ~ 02026433\x00v 0000 ~ 02026712 v 0000 04 + 08 00 + 09 00 + 20 00 + 21 00 | accompany as an escort; "She asked her older brother to escort her to the ball"  \n': invalid literal for int() with base 10: '02026433\x00v'
However, when I use the online WordNet search tool at http://wordnetweb.princeton.edu/perl/webwn, it performs the lookup as expected. The latest WordNet corpus was downloaded using nltk.download().

The error seems to reference a hex value in the WordNet definition for the word when it is expecting to find an integer value.

Any ideas? Please advise if you’ve run into something like this.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:1
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

5reactions
ndvbdcommented, Jul 12, 2019

I see this problem when I use the wordnet synsets from a few threads in parallel. If it’s on 1-2 threads it’s fine, but with more than that (3, 4 threads) I get these strange errors. I wonder what part of the wordnet is not thread safe.

1reaction
nschneidcommented, Jun 15, 2017

Works fine for me on OS X 10.11.6 with:

Python 3.6.0 |Anaconda 4.3.1 (x86_64)| (default, Dec 23 2016, 13:19:00)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Read more comments on GitHub >

github_iconTop Results From Across the Web

NLTK WordNet error with a word look up using synsets
The error seems to reference a hex value in the WordNet definition for the word when it is expecting to find an integer...
Read more >
Sample usage for wordnet - NLTK
Look up a word using synsets() ; this function has an optional pos argument which lets you constrain the part of speech of...
Read more >
Lookup Synset wordnet Python NLTK - YouTube
Learn how to lookup synsets for a word in a WordNet using Python NLTK.
Read more >
NLP | Synsets for a word in WordNet - GeeksforGeeks
Synset is a special kind of a simple interface that is present in NLTK to look up words in WordNet. Synset instances are...
Read more >
How to use NLTK WordNet? - eduCBA
Synset is a specific type of simple interface used in NLTK that allows users to search WordNet for words. Synset examples are collections...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found