question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Steps to utilize NeuroNER for other languages

See original GitHub issue

It appears that BART at least is pretty language agnostic. The English specific parts of NeuroNER (afaict), are the recommended glove.6B.100d word vectors, and all of the spacy related tokenizing code, which is used to translate BART format into CoNLL format (correct?)

Am I correct that if I:

  1. Supply Korean word vectors in /data/word_vectors
  2. Supply CoNLL formatted train, valid, and test data using BART labeled Korean text which I run through my own tokenizer

I will be able to train and use NeuroNER for Korean text?

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:1
  • Comments:10 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
Gregory-Howardcommented, Jul 6, 2017

Hi (I’m the guy who uses NeuroNER in French)! These 2 steps are true, but you also need spacy (or nltk) working in Korean. I’m explaining a bit more for SpaCy : You need a SpaCy Korean model. This consist in a tokenizer and a POS Tagging model. Someone asked exactly this question : https://github.com/explosion/spaCy/issues/929 Then you will have to change spacylanguage in parameter.ini I hope I’m clear, if not, feel free to ask.

Steps (for spacy) language : X:

3reactions
Franck-Dernoncourtcommented, Jul 4, 2017

Correct! Note that providing word vectors is optional (it’s typically better if you have some), and that I haven’t tested NeuroNER with languages other than English. I know someone successfully used it in French (after an encoding fix PR 😃), and someone was supposed to try with Bangladeshi but I haven’t heard back from him.

On Jul 3, 2017 9:49 PM, “Sooheon Kim” notifications@github.com wrote:

It appears that BART at least is pretty language agnostic. The English specific parts of NeuroNER (afaict), are the recommended glove.6B.100d word vectors, and all of the spacy related tokenizing code, which is used to translate BART format into CoNLL format (correct?)

Am I correct that if I:

  1. Supply Korean word vectors in /data/word_vectors
  2. Supply CoNLL formatted train, valid, and test data using BART labeled Korean text which I run through my own tokenizer

Will I be able to train and use NeuroNER for Korean text?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Franck-Dernoncourt/NeuroNER/issues/30, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA7447RV_hPWNxKIwrgUb6oHxSekvLUks5sKahDgaJpZM4OM1FP .

Read more comments on GitHub >

github_iconTop Results From Across the Web

NeuroNER: an Easy-to-Use Named-Entity Recognition Tool ...
Named-entity recognition (NER) aims at identifying entities of interest in the text, such as location, organization and temporal expression. Identified entities ...
Read more >
Franck-Dernoncourt/NeuroNER - GitHub
NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to install and use NeuroNER ...
Read more >
Speaking your mind: links between languages and other skills
If you speak Mandarin, your brain is different: Untangling the brain's mechanisms for language has been a pillar of neuroscience since its inception:...
Read more >
Language Translation with RNNs - Towards Data Science
For this project, we'll use a many-to-many process where the input is a sequence of English words and the output is a sequence...
Read more >
NLP with spaCy and business tools you can build right now
Let's take a look at how you can use spaCy, a state of the art natural language processing tool, to build custom software...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found