Inconsistent example output
See original GitHub issueInstalled 6.3.7 from pip as instructed on both latest macOS and ubuntu 16.04 and downloaded frequency_dictionary_en_82_765.txt
from the official github.
Just ran the examples in the README.md and got inconsistent output on both platforms as follows:
Sample usage (lookup and lookup_compound)
The last number(log_prob_sum) is 11 instead of 10.
members, 226656153, 1
where is to love he had dated for much of the past who couldn't read in six grade and inspired him, 300000, 11
Sample usage (word_segmentation)
The first word the
is segmented as t
and he
which are a bit obvious.
Also overt he
should be over the
.
I noticed the last two numbers are different from 8 -34.491167981910635
.
t he quick brown fox jumps overt he lazy dog, 10, -52.10066239535173
Next, I tried to segment the test string itwasthebestoftimesitwastheworstoftimesitwastheageofwisdomitwastheageoffoolishness
from the official site and the output shows the same error pattern of the
as t he
.
Any ideas?
Issue Analytics
- State:
- Created 5 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Inconsistent Definition & Meaning - YourDictionary
When one scientist does an experiment and gets one result and the other does it and gets a contrary result, this is an...
Read more >Inconsistent System of Equations | Overview, Steps & Examples
Learn about inconsistent systems of equations. Study graphs of inconsistent solutions, and discover how to identify inconsistent systems ...
Read more >Consistent And Inconsistent Systems - Maths - Vedantu
Learn about Consistent And Inconsistent Systems of Maths in detail on vedantu.com. ... calculation, method, solved examples and faqs for better understanding.
Read more >Inconsistent results definition and meaning - Collins Dictionary
If you describe someone as inconsistent, you are criticizing them for not behaving in the same way every time a similar situation occurs....
Read more >Consistent and Inconsistent Linear ... - CK12-Foundation
To identify a system as consistent, inconsistent, or dependent, we can graph the two lines on the same graph and see if they...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I see, I might have saved it as utf-8 when I was debugging the program and uploaded it as that.
load_dictionary
allows you to choose theencoding
so you could use that as well.I finally figured out the cause which has nothing to do with the installation but some invisible character in the frequency dictionary that I downloaded from the original author wolfgarbe’s repo:
frequency_dictionary_en_82_765.txt
Running
diff
to compare the files, showing there is an invisible difference in line 1 that isthe
coincidentally:I tried
:set list
in vim to show the invisible chars but there seems nothing different. Then the first line was print and it happens to be the unicode Byte Order Mark (BOM) causing the issue as discussed in the thread:One workaround would be to set the encoding argument as follows: