question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CLTK Arabic support

See original GitHub issue

This list for Arabic support issues and todos

General Improvements

  • Make the code looks good and clean (Optimization/Performance/Quality) [WIP].
  • Remove duplicated code [WIP].

1. Romanization Systems

2. Arabic Stop words project

  • Improve the list of Arabic stop words [WIP].

3. Support remote libs

  • Added pyarabic to CLTK without using pip package in this namespace cltk.corpus.utils.arabic.pyarabic. I suggest this solution for avoid problems of remote libs during installation or usage , most of users don’t have time to install extra packages [WIP].
  • Add number function : transform arabic numbers <-> arabic strings [WIP].
  • Add Araby_Statistics : a module to calculate different statistics on Quranic text [WIP].
  • Add Arabic Normalizer[WIP]
  • Add documentation for pyarabic lib [WIP] .
  • Add unit testing for pyarabic lib [WIP] .

4. Arabic Tokenization

  • Add Arabic word Tokenization.
  • Add Arabic Sentence Tokenization.

5. Arabic Stemming

6. Arabic IR

  • Add Alfanous Quranic search engine lib and make it compatible with python3.6.
  • Make whoosh integration support Classical Arabic as well as.

7. Arabic Corpus

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:6
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
ibrahimsharafcommented, Oct 1, 2017

Hi @LBenzahia, @kylepjohnson, I am a native Arabic speaker with intermediate Python background, and I am willing to contribute to this, where can I start?

1reaction
greenat92commented, Oct 3, 2017

Hi @ibrahimsharaf, yes you can, It’ll be a great contribution khoja’s stemmer fits with classical arabic, Please i would you to Remember that we are working on Classical arabic not Modern arabic, you have to remove some rules in khoja stemmer : على سبيل المثال في العربية الكلاسكية نستعمل ألف الاستفهام بكثرة بدل من هل خاصة في نصو ص القرآن الكريم بعض الاوزان الدخيلة ﻻيمكنك معالجتها مثل وزن فعالة اسم الآلة غير موجودة في اللغة العربية الكلاسكية هي بعض .الفروقات أروجو منك أخذ هذا بعين الاعتبار Let me know if you want to solve any issue above and i’ll mention you there! For the implementation of the stemmer i would you to take a look at stem module for to do similar work with cltk style code and to respect their convention, Let me know if there’s any question. good luck

Read more comments on GitHub >

github_iconTop Results From Across the Web

Arabic — Classical Language Toolkit documentation
CLTK Arabic Support ¶. 1. Pyarabic¶. Specific Arabic language library for Python, provides basic functions to manipulate Arabic letters and text, like detecting ......
Read more >
An NLP Framework for Pre-Modern Languages - ACL Anthology
This paper announces version 1.0 of the Clas- sical Language Toolkit (CLTK), an NLP frame- work for pre-modern languages. The vast ma-.
Read more >
Arabic - Lewis & Clark - LClark.edu
Arabic is the native language of more than 250 million people worldwide, ... At Lewis & Clark, courses focus on MSA with an...
Read more >
Arabic Language Testing: The State of the Art - jstor
This article is an attempt to characterize and discuss Arabic language test- ... and John Clark's Arabic Proficiency Tests designed for DLI graduating...
Read more >
Gender, Authorship, and Translation in Modern Arabic ...
Rather, as Clark himself points out, ʿUjaylī is “unknown and therefore a risk,” making him “like a first-time English-language novelist, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found