Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pre-trained Entity Extractor for Foreign Languages

See original GitHub issue

Rasa NLU version: 0.13.8

Operating system (windows, osx, …): Ubuntu 16.04

Content of model configuration file:

language: "kr"
pipeline:
- name: "component.KoreanTokenizer"
- name: "component.PreTrainedCRF"
- name: "component.DomainSpecificCRF"
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
- name: "intent_entity_featurizer_regex"

Idea:

Me and my colleagues are currently developing a dialogue system for Korean.

I am trying to adopt a pre-trained NER component (component.PreTrainedCRF) which can extract Name, Place, Organization, Time and Date, just like ner_duckling for English.
My plan is to pass user input to component.PreTrainedCRF where general entities (Name, Placec, …etc) are extracted first and then the same user input is passed to the second CRF model (component.DomainSpecificCRF_ whee domain-dependent entities are extracted (eg. cuisine_type)

Issues:

I have trained component.PreTrainedCRF based on a large corpus , producing “pre_trained_crf_model.pkl”
However, I cannot find any document that describes how to use the pkl file for further use. I have read [https://medium.com/rasa-blog/enhancing-rasa-nlu-models-with-custom-components-6f54040c4a77] this article which is about adding a custom component, but my case is different that I would like to add another CRF model (component.PreTrainedCRF)

Please let me know how to bridge the two CRF models

I would like to emphasize that I have two different training data sets : one is the large corpus which is for training component.PreTrainedCRF, and the other is “usual” training md file which is for training component.DomainSpecificCRF

Please note that #822 was not helpful for this issue

Issue Analytics

State:
Created 5 years ago
Comments:9 (3 by maintainers)

Top GitHub Comments

1reaction

robinsongh381commented, Mar 4, 2019

Thanks for the tips !

I will discuss the sharing of the tokenizer component with my colleagues who made the component, and i will give you a response soon !

Cheers

0reactions

akeladcommented, Mar 6, 2019

@robinsongh381 let’s move this to the forum, this is more of a usage question at this point. You can change the extract_entities method, or use the one from the CRF, whichever works for you

Top Results From Across the Web

Pre-trained Entity Extractor for Foreign Languages · Issue #1753

I am trying to adopt a pre-trained NER component (component.PreTrainedCRF) which can extract Name, Place, Organization, Time and Date, just like ...

Entity Recognition with NeuralSpace in 80+ Languages

Language Support: 80+ languages supported; Entity Basket: 36 different entities can be extracted using our pre-trained models. Train with AutoNLP (coming soon): ...

MonkeyLearn's Entity Extraction API & Other Tools

Learn how to use MonkeyLearn's API to automatically extract names, locations, organizations, and more, from within a text. Discover other ...

Understanding Named Entity Recognition Pre-Trained Models

Named Entity Recognition (NER) is an application of Natural language ... Also known as entity identification, entity chunking and entity extraction.

A comparative study of pre-trained language models for ...

In their study, BERT and BioBERT have been examined to extract entities from clinical trial protocols and they show improved performance, ...