question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Support language which need tokenizer (Chinese, Japanese .etc)

See original GitHub issue

I think iepy need a common interface to embed a tokenizer to support language like Chinese, Japanese .etc.

There is a old ie project with gui named GATE, it contain a pre-trained model and dataset, maybe helpful https://gate.ac.uk/sale/tao/splitch15.html#sec:misc-creole:language-plugins:chinese

Issue Analytics

  • State:open
  • Created 7 years ago
  • Reactions:1
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
YanWenqiangcommented, Sep 25, 2017

@eromoe Right now, I want iepy to customize to Chinese, could you give me a hand ?

0reactions
hwakingcommented, Dec 9, 2017

@eromoe I am doing Chinese EMR information extraction , can i use iepy to do entity relationship extraction ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Language Analysis | Apache Solr Reference Guide 7.7
For Simplified Chinese, Solr provides support for Chinese sentence and word segmentation with the HMM Chinese Tokenizer. This component includes a large ...
Read more >
New way of tokenization of Chinese - Manticore Search
The Chinese language belongs to the so-called CJK language family (Chinese, Japanese, and Korean). They are probably the most complicated ...
Read more >
Chinese and Japanese Lexical Tokenization
For Chinese and Japanese, in addition to the statistical model described above, RBL includes Chinese Language Analyzer (CLA) and Japanese...
Read more >
How to make scikit-learn vectorizers work with Japanese ...
How to use NLP with scikit-learn vectorizers in Japanese, Chinese (and other East Asian languages) by using a custom tokenizer#.
Read more >
Tokenize and Transliterate Japanese, Chinese, Korean - Reddit
Maybe some of you nice people have some ideas about the best way to go about tokenization for Korean (Mecab support Korean?) and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found