question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

get_corpus_reader does not work

See original GitHub issue

Hello, I have tried to follow the instructions in the CLTK documentation to no avail. I am currently running on Debian 10, and installed CLTK exactly as prescribed, with all requisite apt packages and running within a virtualenv. Nevertheless, the get_corpus_reader function simply doesn’t work. I’ve tried it with several languages and it fails every time but in different ways.

(venv) kevin@XXXX ~/dir> python3
Python 3.7.3 (default, Dec 20 2019, 18:57:59) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from cltk.corpus.readers import get_corpus_reader
>>> latin_corpus = get_corpus_reader(corpus_name = 'latin_text_latin_library', language = 'latin')
Traceback (most recent call last):
  File "/home/kevin/venv/lib/python3.7/site-packages/cltk/tokenize/sentence.py", line 74, in __init__
    f'{self.language}_punkt.pickle'))
  File "/home/kevin/venv/lib/python3.7/site-packages/cltk/utils/file_operations.py", line 36, in open_pickle
    with open(path, 'rb') as opened_pickle:
FileNotFoundError: [Errno 2] No such file or directory: '/home/kevin/cltk_data/latin/model/latin_models_cltk/tokenizers/sentence/latin_punkt.pickle'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kevin/venv/lib/python3.7/site-packages/cltk/corpus/readers.py", line 51, in get_corpus_reader
    sentence_tokenizer = TokenizeSentence(language)
  File "/home/kevin/venv/lib/python3.7/site-packages/cltk/tokenize/sentence.py", line 125, in __init__
    super().__init__(language='latin', lang_vars=self.lang_vars)
  File "/home/kevin/venv/lib/python3.7/site-packages/cltk/tokenize/sentence.py", line 76, in __init__
    raise type(err)(BasePunktSentenceTokenizer.missing_models_message)
FileNotFoundError: BasePunktSentenceTokenizer requires a language model.
>>> 

Here’s what happens with and Old Norse corpus

(venv) kevin@XXXX ~/dir> python3
Python 3.7.3 (default, Dec 20 2019, 18:57:59) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from cltk.corpus.readers import get_corpus_reader
>>> corpus = get_corpus_reader(corpus_name = 'old_norse_text_perseus', language = 'old_norse')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/kevin/venv/lib/python3.7/site-packages/cltk/corpus/readers.py", line 46, in get_corpus_reader
    if not os.path.exists(root) or corpus_name not in SUPPORTED_CORPORA.get(language):
TypeError: argument of type 'NoneType' is not iterable
>>> 

In both cases I had made sure to download/import these corpora

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
todd-cookcommented, Aug 5, 2020

@krfkeith Can you confirm you imported the latin_models_cltk corpus? e.g. see: https://docs.cltk.org/en/latest/importing_corpora.html The import should download that model file the stack trace is complaining about. I’ll make a note to make a better error message out of this.

1reaction
kylepjohnsoncommented, Aug 5, 2020

Can you install corpus_importer.import_corpus("latin_models_cltk")?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Camomile Collaborative Annotation Platform
The recommended way to install and run the Camomile server is using Docker. MongoDB. It relies on MongoDB for storing annotations. $ export...
Read more >
Java Code Examples of java.io.SequenceInputStream
This page provides Java code examples for java.io.SequenceInputStream. The examples are extracted from open source Java projects from GitHub.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found