question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use with other languages other than english?

See original GitHub issue

I would like to use KeyBert with the French language. To do this, must I select model and pass it through KeyBERT with model? Like this:

from keybert import KeyBERT
doc = """
L'apprentissage supervisé est la tâche d'apprentissage machine qui consiste à apprendre une fonction qui associe une entrée à une sortie en se basant sur des exemples de paires entrée-sortie [1]. 
Il déduit une fonction à partir de données de formation étiquetées consistant en un ensemble d'exemples de formation.
Dans l'apprentissage supervisé, chaque exemple est une paire constituée d'un objet d'entrée  (généralement un vecteur) et une valeur de sortie souhaitée (également appelée signal de supervision). 
"""
model = KeyBERT(model='MODEL_TO_CHOOSE')
keywords = model.extract_keywords(doc)

For the French language:

  • Which model do you recommend?
  • Is that xlm-r-bert-base-nli-stsb-mean-tokens is a good choice?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
MaartenGrcommented, Feb 15, 2021

Hmmm, it might improve if you increase the keyphrase_ngram_range to (1, 3) for example. However, this is exactly what can happen with KeyBERT. It is highly dependent on the underlying embedding model. For that reason, I typically like to use embedding models that are trained mostly for similarity measures, like those used in sentence-transformers (e.g., xlm-r-bert-base-nli-stsb-mean-tokens).

All in all, it requires some experimentation with models until you find the one best suited to your use-case.

0reactions
LeMousselcommented, Feb 15, 2021

Compare keyword extraction results, in French language, from TF/IDF, Yake, KeyBert: https://gist.github.com/LeMoussel/aa5f7c46c6cb09473e97b20eb2e13cc4 The quality of keywords with KeyBert seems “strange” to me. 🤔

Read more comments on GitHub >

github_iconTop Results From Across the Web

7 Ways to Use Foreign Languages in Your Fiction - Bookfox
2. Full Translation (Write it in English). Just because characters are speaking a language other than English does not necessarily mean that you ......
Read more >
Ideas on best ways to engage in multilingual conversations
Write in your own language · Write simply · Leave the translation to the recipient · Encourage others to use their primary (or...
Read more >
Why is English so weirdly different from other languages? - Aeon
No, English isn't uniquely vibrant or mighty or adaptable. But it really is weirder than pretty much every other language.
Read more >
Languages Other Than English - Wikipedia
Chinese · French · Modern Greek ; Indonesian · Vietnamese ; Korean · Tamil ; Hindi.
Read more >
Science Communication in Multiple Languages Is Critical to Its ...
Similar statistics are observed for other languages (Figure 1). ... expand access to scientific knowledge in languages other than English, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found