NLTK WordNet error with a word look up using synsets
See original GitHub issueI am using Python 3.6 with NLTK 3.2.3, and I am getting a “WordNetError” only for the word “escort”. I don’t get errors with any other words. Here’s the transcript showing success with the word “dog” and the error using the word “escort”:
Python 3.6.0 |Anaconda custom (64-bit)| (default, Dec 23 2016, 12:22:00)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from nltk.corpus import wordnet
>>> wordnet.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
>>> wordnet.synsets('escort')
Traceback (most recent call last):
File "/home/user1/.conda/envs/ca/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1403, in _synset_from_pos_and_line
offset = int(_next_token())
ValueError: invalid literal for int() with base 10: '02026433\x00v'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user1/.conda/envs/ca/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1491, in synsets
for p in pos
File "/home/user1/.conda/envs/ca/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1493, in <listcomp>
for offset in index[form].get(p, [])]
File "/home/user1/.conda/envs/ca/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1335, in synset_from_pos_and_offset
synset = self._synset_from_pos_and_line(pos, data_file_line)
File "/home/user1/.conda/envs/ca/lib/python3.6/site-packages/nltk/corpus/reader/wordnet.py", line 1448, in _synset_from_pos_and_line
raise WordNetError('line %r: %s' % (data_file_line, e))
nltk.corpus.reader.wordnet.WordNetError: line '02025829 38 v 01 escort 0 006 @ 02025550 v 0000 + 09992538 n 0102 ~ 02026203 v 0000 ~ 02026327 v 0000 ~ 02026433\x00v 0000 ~ 02026712 v 0000 04 + 08 00 + 09 00 + 20 00 + 21 00 | accompany as an escort; "She asked her older brother to escort her to the ball" \n': invalid literal for int() with base 10: '02026433\x00v'
However, when I use the online WordNet search tool at http://wordnetweb.princeton.edu/perl/webwn, it performs the lookup as expected. The latest WordNet corpus was downloaded using nltk.download().
The error seems to reference a hex value in the WordNet definition for the word when it is expecting to find an integer value.
Any ideas? Please advise if you’ve run into something like this.
Issue Analytics
- State:
- Created 6 years ago
- Reactions:1
- Comments:5 (2 by maintainers)
Top Results From Across the Web
NLTK WordNet error with a word look up using synsets
The error seems to reference a hex value in the WordNet definition for the word when it is expecting to find an integer...
Read more >Sample usage for wordnet - NLTK
Look up a word using synsets() ; this function has an optional pos argument which lets you constrain the part of speech of...
Read more >Lookup Synset wordnet Python NLTK - YouTube
Learn how to lookup synsets for a word in a WordNet using Python NLTK.
Read more >NLP | Synsets for a word in WordNet - GeeksforGeeks
Synset is a special kind of a simple interface that is present in NLTK to look up words in WordNet. Synset instances are...
Read more >How to use NLTK WordNet? - eduCBA
Synset is a specific type of simple interface used in NLTK that allows users to search WordNet for words. Synset examples are collections...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I see this problem when I use the wordnet synsets from a few threads in parallel. If it’s on 1-2 threads it’s fine, but with more than that (3, 4 threads) I get these strange errors. I wonder what part of the wordnet is not thread safe.
Works fine for me on OS X 10.11.6 with: