How to use with other languages other than english?
See original GitHub issueI would like to use KeyBert with the French language. To do this, must I select model and pass it through KeyBERT with model? Like this:
from keybert import KeyBERT
doc = """
L'apprentissage supervisé est la tâche d'apprentissage machine qui consiste à apprendre une fonction qui associe une entrée à une sortie en se basant sur des exemples de paires entrée-sortie [1].
Il déduit une fonction à partir de données de formation étiquetées consistant en un ensemble d'exemples de formation.
Dans l'apprentissage supervisé, chaque exemple est une paire constituée d'un objet d'entrée (généralement un vecteur) et une valeur de sortie souhaitée (également appelée signal de supervision).
"""
model = KeyBERT(model='MODEL_TO_CHOOSE')
keywords = model.extract_keywords(doc)
For the French language:
- Which model do you recommend?
- Is that
xlm-r-bert-base-nli-stsb-mean-tokensis a good choice?
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
7 Ways to Use Foreign Languages in Your Fiction - Bookfox
2. Full Translation (Write it in English). Just because characters are speaking a language other than English does not necessarily mean that you ......
Read more >Ideas on best ways to engage in multilingual conversations
Write in your own language · Write simply · Leave the translation to the recipient · Encourage others to use their primary (or...
Read more >Why is English so weirdly different from other languages? - Aeon
No, English isn't uniquely vibrant or mighty or adaptable. But it really is weirder than pretty much every other language.
Read more >Languages Other Than English - Wikipedia
Chinese · French · Modern Greek ; Indonesian · Vietnamese ; Korean · Tamil ; Hindi.
Read more >Science Communication in Multiple Languages Is Critical to Its ...
Similar statistics are observed for other languages (Figure 1). ... expand access to scientific knowledge in languages other than English, ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Hmmm, it might improve if you increase the
keyphrase_ngram_rangeto (1, 3) for example. However, this is exactly what can happen with KeyBERT. It is highly dependent on the underlying embedding model. For that reason, I typically like to use embedding models that are trained mostly for similarity measures, like those used in sentence-transformers (e.g., xlm-r-bert-base-nli-stsb-mean-tokens).All in all, it requires some experimentation with models until you find the one best suited to your use-case.
Compare keyword extraction results, in French language, from TF/IDF, Yake, KeyBert: https://gist.github.com/LeMoussel/aa5f7c46c6cb09473e97b20eb2e13cc4 The quality of keywords with KeyBert seems “strange” to me. 🤔