question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pre-trained Entity Extractor for Foreign Languages

See original GitHub issue

Rasa NLU version: 0.13.8

Operating system (windows, osx, …): Ubuntu 16.04

Content of model configuration file:

language: "kr"
pipeline:
- name: "component.KoreanTokenizer"
- name: "component.PreTrainedCRF"
- name: "component.DomainSpecificCRF"
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
- name: "intent_entity_featurizer_regex"

Idea:

Me and my colleagues are currently developing a dialogue system for Korean.

  • I am trying to adopt a pre-trained NER component (component.PreTrainedCRF) which can extract Name, Place, Organization, Time and Date, just like ner_duckling for English.

  • My plan is to pass user input to component.PreTrainedCRF where general entities (Name, Placec, …etc) are extracted first and then the same user input is passed to the second CRF model (component.DomainSpecificCRF_ whee domain-dependent entities are extracted (eg. cuisine_type)

Issues:

  • I have trained component.PreTrainedCRF based on a large corpus , producing “pre_trained_crf_model.pkl”

  • However, I cannot find any document that describes how to use the pkl file for further use. I have read [https://medium.com/rasa-blog/enhancing-rasa-nlu-models-with-custom-components-6f54040c4a77] this article which is about adding a custom component, but my case is different that I would like to add another CRF model (component.PreTrainedCRF)

Please let me know how to bridge the two CRF models

I would like to emphasize that I have two different training data sets : one is the large corpus which is for training component.PreTrainedCRF, and the other is “usual” training md file which is for training component.DomainSpecificCRF

Please note that #822 was not helpful for this issue

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
robinsongh381commented, Mar 4, 2019

Thanks for the tips !

I will discuss the sharing of the tokenizer component with my colleagues who made the component, and i will give you a response soon !

Cheers

0reactions
akeladcommented, Mar 6, 2019

@robinsongh381 let’s move this to the forum, this is more of a usage question at this point. You can change the extract_entities method, or use the one from the CRF, whichever works for you

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pre-trained Entity Extractor for Foreign Languages · Issue #1753
I am trying to adopt a pre-trained NER component (component.PreTrainedCRF) which can extract Name, Place, Organization, Time and Date, just like ...
Read more >
Entity Recognition with NeuralSpace in 80+ Languages
Language Support: 80+ languages supported; Entity Basket: 36 different entities can be extracted using our pre-trained models. Train with AutoNLP (coming soon): ...
Read more >
MonkeyLearn's Entity Extraction API & Other Tools
Learn how to use MonkeyLearn's API to automatically extract names, locations, organizations, and more, from within a text. Discover other ...
Read more >
Understanding Named Entity Recognition Pre-Trained Models
Named Entity Recognition (NER) is an application of Natural language ... Also known as entity identification, entity chunking and entity extraction.
Read more >
A comparative study of pre-trained language models for ...
In their study, BERT and BioBERT have been examined to extract entities from clinical trial protocols and they show improved performance, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found