[ Performance and Memory Analysis for Large Dataset ] very slow for large numbers of Hits
See original GitHub issueI am trying to run language on using this scrit
final LanguageDetector detector = LanguageDetectorBuilder.fromLanguages(ENGLISH, FRENCH, GERMAN, SPANISH, JAPANESE, CHINESE,ITALIAN, PORTUGUESE,ARABIC,RUSSIAN,DUTCH,KOREAN,SWEDISH,HINDI,POLISH).build();
long start=System.currentTimeMillis();
final Language detectedLanguage = detector.detectLanguageOf("Zum Vergleich kann es auch nützlich sein, diese Rankings neben einigen etwas älteren Forschungsergebnissen zu sehen. Im Jahr 2013, Common Sense Advisory zur Verfügung gestellt , eine empirische Studie basiert auf einer Wallet World Online (WOW) - definiert als ‚die gesamte wirtschaftliche Chance, sowohl online als auch offline, berechnet durch einen Anteil eines Landes BIP zu allen wichtigen Blöcken dieser Gesellschaft assoziieren. ' Hier ist, was uns ihre Studie gezeigt hat.");
// System.out.println(detectedLanguage.toString());
long end=System.currentTimeMillis();
System.out.println("Time: "+ (end - start));
it’s taking 700millisecong. which is very slow. which can not be used for 10000+ files… is there any approach to get results with 1-10milliseconds?
or any function like isEnglish(). which will be true only for English…
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
What to Do When Your Data Is Too Big for Your Memory?
Another way to handle large datasets is by chunking them. That is cutting a large dataset into smaller chunks and then processing those...
Read more >Are You Still Using Pandas to Process Big Data in 2021? Here ...
This intrigued me to do a practical experiment with Dask and Vaex and try to process a bigger than memory dataset. The dataset...
Read more >Memory Management for Large Data Sets - NI
To do so, break large data sets into smaller sets when transporting data from one place to another - a strategy known as...
Read more >Handling large data sets in R - AWS
The Problem with large data sets in R: · R reads entire data set into RAM all at once. Other programs can read...
Read more >SolrPerformanceProblems - Solr - Apache Software Foundation
Even if the number of actual hits are very low, the fact that the client requests a huge number of rows will cause...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
thanks, it got reduced from 600 milliseconds to 70 milliseconds
If I recall correctly the Korean, Chinese and Japanese language models are quite large. So if you know beforehand, that your input is in neither of those languages you can save quite a lot of memory by excluding them.
On the other hand, if your input text is in a language which you have not included, or which Lingua does not support, and which is similar to English, Lingua could erroneously claim that the text is in English.