[QUESTION] Information on NER component
See original GitHub issueDescribe what you would like to know about CAMeL Tools.
Hello, I wanted to know if you could provide some information regarding the NER component of the library.
In the catalog JSON file, you mention that you are using a finetuned AraBERT model, with the specified version being 1.0.0. So from here, I wanted to know:
- whether the model used as base was indeed AraBERTv1 from this repo ?
- which dataset you used ?
- whether you used the FARASA preprocessing for the finetuning or your own given that they used the former for pretraining ?
I ask because while doing some research I saw that your lab has produced multiple arabic BERT models, which have the benefit of:
- having used the
camel_tools
preprocessing rather the FARASA for both pretraining and finetuning - have dialect-specific variants, which may be interesting in some cases
- seem to outperform the AraBERTv1 on NER tasks according to your paper
I was wondering whether you would consider making these models available for use in this library ? I know you have released the code and pretrained model, and I am planning on experimenting with this, but thought it would be a nice addition.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Named Entity Recognition for Question Answering
Current text-based question answering (QA) systems usually contain a named en- tity recogniser (NER) as a core compo- nent. Named entity recognition has...
Read more >Improving Question Answering Using Named Entity Recognition
This paper studies the use of Named Entity Recognition (NER) for the Question Anwering (QA) task in Spanish texts. NER applied as a...
Read more >Named Entity Recognition with NLTK and SpaCy | by Susan Li
Named entity recognition (NER)is probably the first step towards information extraction that seeks to locate and classify named entities in ...
Read more >What is NER And Why It's Hard to Get Right - Galileo
NER is a very important upstream component because it supports real-world applications like conversational agents, information retrieval, ...
Read more >A Quick Overview: Named Entity Recognition (NER) in Natural ...
NER suits the intent of Information Extraction (IE) which is to produce a knowledge base. It can organize and arrange the information in...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
So for the NER component, we didn’t do any preprocessing before fine-tuning and we used
aubmindlab/bert-base-arabertv01
, which did not use FARASA segmentation before the pretraining.We also just released new NER models which were fine-tuned using our own CAMeLBERT models on Hugging Face’s model hub. Here’s an example on how to use the CAMeLBERT NER MSA model. Disclaimer: Although in the example we use the NER component from CAMeL Tools to load the model directly from hub, this is still a work in progress so please use with caution.
Hi @rom1K ,
The version numbers in catalogue.json are our own internal versioning for datasets and have nothing to do with the AraBERT version used. @balhafni could tell you the exact AraBERT version we used in our current model.
We fine-tune using the ANERcorp dataset (you can read more about that in our paper) but we don’t use FARASA for pereprocessing. Again, @balhafni can tell you exactly what preprocessing we perform.
We definitely plan to incorporate the new BERT models in a future release of camel-tools 😃