Detect source language with langdetect package
See original GitHub issueThe langdetect has worked well for me in the past for language detection problems. How would you feel about allowing users to pass 'auto'
as an option for source
? I could see some pros and cons:
Pros
- Users don’t need to be able to recognize a language to translate
- Eliminates pre-classification of languages if your dataset contains multiple languages
Cons
- Adds another dependency
langdetect
detects these 55 languages only
I’m a little new to open source but I would love to contribute 🙂 Of course, if you feel this doesn’t fit this package’s mission that’s totally understandable.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
langdetect - PyPI
Language detection library ported from Google's language-detection. ... langdetect supports 55 languages out of the box (ISO 639-1 codes):
Read more >Detect source language with langdetect package #37 - GitHub
Hey langdetect is cool! However it seems there's many options for language detection, including fasttext and langid.py. Each option will have a ...
Read more >python - How to determine the language of a piece of text?
1. TextBlob. Requires NLTK package, uses Google. from textblob import TextBlob b = TextBlob("bonjour") b.detect_language().
Read more >Detect an Unknown Language using Python - GeeksforGeeks
The idea behind language detection is based on the detection of the character among the expression and words in the text.
Read more >4 Python libraries to detect English and Non-English language
We will discuss spacy-langdetect, Pycld2, TextBlob, and Googletrans for language detection. This solve natural language processing (NLP) ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Those are some good points, I agree it would be confusing to have the library detect a language but not translate it. I’ll take a look into writing something that could potentially put into the user guide.
@banyous Feel free to contribute a section in the user guide about using language detection, and from there, if we feel a wrapper around fasttext would make life easier, then I’m happy to welcome a PR to add language detection to
dlt.utils
ordlt.lang
I think this is a decent starting point: https://fasttext.cc/docs/en/language-identification.html