question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

OpenOnDemandZipFile fail to open punkt.zip on python3.x

See original GitHub issue

An exception raises when excuting an simple tutorial in nltk_book as the following command. $ python3 -c 'from nltk import word_tokenize; text = word_tokenize("And now for something completely different")'

I think the problem is caused by the decorator @py3_data on ZipFilePathPointer.__init__ and OpenOnDemandZipFile.__init__. This decorator will append ‘/PY3’ to the first arg, which is a a zip filepath as str, like “~/nltk_data/tokenizers/punkt.zip”, but “~/nltk_data/tokenizers/punkt.zip/PY3” can’t be opened as a zipfile.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/git/nltk/nltk/tokenize/__init__.py", line 109, in word_tokenize
    return [token for sent in sent_tokenize(text, language)
  File "/git/nltk/nltk/tokenize/__init__.py", line 93, in sent_tokenize
    tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
  File "/git/nltk/nltk/data.py", line 808, in load
    opened_resource = _open(resource_url)
  File "/git/nltk/nltk/data.py", line 926, in _open
    return find(path_, path + ['']).open()
  File "/git/nltk/nltk/data.py", line 648, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource 'tokenizers/punkt/PY3/english.pickle' not found.
  Please use the NLTK Downloader to obtain the resource:  >>>
  nltk.download()
  Searched in:
    - '/home/joybin/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
stevenbirdcommented, Dec 11, 2017

@alvations fyi in compat.py:

# The following datasets have a /PY3 subdirectory containing
# a full copy of the data which has been re-encoded or repickled.
DATA_UPDATES = [("chunkers", "maxent_ne_chunker"),
                ("help", "tagsets"),
                ("taggers", "maxent_treebank_pos_tagger"),
                ("tokenizers", "punkt")]

From memory these extra subdirectories were created manually.

1reaction
joybinchencommented, Apr 24, 2017

@alvations I’m using python3.5 on ubuntu 17.04. I’m sure I have downloaded all zip file under ~/nltk_data. I can run the case without any Exception when ~/nltk_data/tokenizers/punkt.zip is unpacked.

$ python3 -c "import sys; print (sys.version)"
3.5.3 (default, Jan 19 2017, 14:11:04) 
[GCC 6.3.0 20170118]
$ python -m nltk.downloader punkt
/usr/lib/python3.5/runpy.py:125: RuntimeWarning: 'nltk.downloader' found in sys.modules after import of package 'nltk', but prior to execution of 'nltk.downloader'; this may result in unpredictable behaviour
  warn(RuntimeWarning(msg))
[nltk_data] Downloading package punkt to /home/joybin/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
Read more comments on GitHub >

github_iconTop Results From Across the Web

17541-MP02 Instruction Manual EN.pdf - Punkt.
Do not open or dismantle the MP02. The battery is not consumer-replaceable and will become hazardous if damaged. If liquid from.
Read more >
NLTK Documentation - Read the Docs
A new window should open, showing the NLTK Downloader. Click on the File menu and select Change Download. Directory. For central installation, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found