question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[QUESTION] Information on NER component

See original GitHub issue

Describe what you would like to know about CAMeL Tools.

Hello, I wanted to know if you could provide some information regarding the NER component of the library.

In the catalog JSON file, you mention that you are using a finetuned AraBERT model, with the specified version being 1.0.0. So from here, I wanted to know:

  • whether the model used as base was indeed AraBERTv1 from this repo ?
  • which dataset you used ?
  • whether you used the FARASA preprocessing for the finetuning or your own given that they used the former for pretraining ?

I ask because while doing some research I saw that your lab has produced multiple arabic BERT models, which have the benefit of:

  • having used the camel_tools preprocessing rather the FARASA for both pretraining and finetuning
  • have dialect-specific variants, which may be interesting in some cases
  • seem to outperform the AraBERTv1 on NER tasks according to your paper

I was wondering whether you would consider making these models available for use in this library ? I know you have released the code and pretrained model, and I am planning on experimenting with this, but thought it would be a nice addition.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
balhafnicommented, Oct 3, 2021

So for the NER component, we didn’t do any preprocessing before fine-tuning and we used aubmindlab/bert-base-arabertv01, which did not use FARASA segmentation before the pretraining.

We also just released new NER models which were fine-tuned using our own CAMeLBERT models on Hugging Face’s model hub. Here’s an example on how to use the CAMeLBERT NER MSA model. Disclaimer: Although in the example we use the NER component from CAMeL Tools to load the model directly from hub, this is still a work in progress so please use with caution.

1reaction
owocommented, Oct 1, 2021

Hi @rom1K ,

The version numbers in catalogue.json are our own internal versioning for datasets and have nothing to do with the AraBERT version used. @balhafni could tell you the exact AraBERT version we used in our current model.

We fine-tune using the ANERcorp dataset (you can read more about that in our paper) but we don’t use FARASA for pereprocessing. Again, @balhafni can tell you exactly what preprocessing we perform.

We definitely plan to incorporate the new BERT models in a future release of camel-tools 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Named Entity Recognition for Question Answering
Current text-based question answering (QA) systems usually contain a named en- tity recogniser (NER) as a core compo- nent. Named entity recognition has...
Read more >
Improving Question Answering Using Named Entity Recognition
This paper studies the use of Named Entity Recognition (NER) for the Question Anwering (QA) task in Spanish texts. NER applied as a...
Read more >
Named Entity Recognition with NLTK and SpaCy | by Susan Li
Named entity recognition (NER)is probably the first step towards information extraction that seeks to locate and classify named entities in ...
Read more >
What is NER And Why It's Hard to Get Right - Galileo
NER is a very important upstream component because it supports real-world applications like conversational agents, information retrieval, ...
Read more >
A Quick Overview: Named Entity Recognition (NER) in Natural ...
NER suits the intent of Information Extraction (IE) which is to produce a knowledge base. It can organize and arrange the information in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found