Pre-trained Entity Extractor for Foreign Languages
See original GitHub issueRasa NLU version: 0.13.8
Operating system (windows, osx, …): Ubuntu 16.04
Content of model configuration file:
language: "kr"
pipeline:
- name: "component.KoreanTokenizer"
- name: "component.PreTrainedCRF"
- name: "component.DomainSpecificCRF"
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
- name: "intent_entity_featurizer_regex"
Idea:
Me and my colleagues are currently developing a dialogue system for Korean.
-
I am trying to adopt a pre-trained NER component (component.PreTrainedCRF) which can extract Name, Place, Organization, Time and Date, just like ner_duckling for English.
-
My plan is to pass user input to component.PreTrainedCRF where general entities (Name, Placec, …etc) are extracted first and then the same user input is passed to the second CRF model (component.DomainSpecificCRF_ whee domain-dependent entities are extracted (eg. cuisine_type)
Issues:
-
I have trained component.PreTrainedCRF based on a large corpus , producing “pre_trained_crf_model.pkl”
-
However, I cannot find any document that describes how to use the pkl file for further use. I have read [https://medium.com/rasa-blog/enhancing-rasa-nlu-models-with-custom-components-6f54040c4a77] this article which is about adding a custom component, but my case is different that I would like to add another CRF model (component.PreTrainedCRF)
Please let me know how to bridge the two CRF models
I would like to emphasize that I have two different training data sets : one is the large corpus which is for training component.PreTrainedCRF, and the other is “usual” training md file which is for training component.DomainSpecificCRF
Please note that #822 was not helpful for this issue
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (3 by maintainers)
Top GitHub Comments
Thanks for the tips !
I will discuss the sharing of the tokenizer component with my colleagues who made the component, and i will give you a response soon !
Cheers
@robinsongh381 let’s move this to the forum, this is more of a usage question at this point. You can change the
extract_entities
method, or use the one from the CRF, whichever works for you