question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hints for using cltk with own latin corpus

See original GitHub issue

This is not really an issue, but a question (I can not find any mailing list on the webpage, sorry).

Since I’m a little lost with the way cltk works, I would appreciate if I could get some help with the following workflow: I want to work with a ‘private’ corpus in latin (resolutions of a catholic religious order from 1500 to 1800). My questions are:

  1. do I need to create a git repository for that? Or is there a possibility to work with local files?
  2. what is exactly the relationship between import_corpus (CorpusImporter) and the objects of class corpus created by nltk?
  3. since I only want to do some exploratory analysis of the corpus, what is the best method I can use then corpus imported with cltk with the methods provided by nltk (frequencies, and so on)?

Many thanks in advance (and many thanks of course for the wonderful library).

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
diyclassicscommented, Jan 25, 2018

Let me add quickly to “If your data is only available on your local filesystem” and if it is a plaintext corpus, you can use NLTK’s PlaintextCorpusReader to help manage the files and get out-of-the-box tokenization (para, sent, word). Cf. https://pynlp.wordpress.com/2013/12/10/unit-5-part-ii-working-with-files-ii-the-plain-text-corpus-reader-of-nltk/. This is the basis of the Latin Library reader in cltk.corpus.latin. If it is not plaintext, you may be able to use a different NLTK reader; see here: http://www.nltk.org/howto/corpus.html.

0reactions
todd-cookcommented, May 22, 2018

Looks like the issue has been resolved. We may want to update the documentation though.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Latin — Classical Language Toolkit documentation
The CorpusReader methods: paras() returns paragraphs, if possible; words() returns a generator of words; sentences returns a generator of sentences; docs ...
Read more >
How do I access the PHI 5.3 corpus through CLTK?
CLTK (the Classical Languages ToolKit) seems to contain several tools to work with the Packhum Latin corpus. However, the actual setup ...
Read more >
CLTK Module in Python - Stack Overflow
I have just begun using the CLTK (classical languages toolkit) NLT module in Python, and wish to use it as a lemmatizer for...
Read more >
Building a Text Analysis Pipeline for Classical Languages
CLTK shows promise of addressing the desideratum of a complete text analysis pipe- line for Greek and Latin, as well as a large...
Read more >
The Future of Ancient Literacy: Classical Language Toolkit ...
[ back ] 12. The CLTK corpus importer allows users to specify their own data sets, in the event that they want to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found