adding new terms and typical workflow
See original GitHub issueHi,
I am trying to add new terms by:
initial_capacity = 83000 # maximum edit distance per dictionary precalculation max_edit_distance_dictionary = 0 prefix_length = 7
sym_spell = SymSpell(initial_capacity, max_edit_distance_dictionary, prefix_length)
if not sym_spell.load_dictionary(dictionary_path, term_index, count_index):
print("Dictionary file not found")
return
sym_spell.create_dictionary_entry("steama", 4) sym_spell.create_dictionary_entry("steamb", 6) sym_spell.create_dictionary_entry("steamc", 2)
result = sym_spell.lookup("streama", 2) print(result)
I am getting an empty []
. What am I missing?
Additionally, could you provide a skeleton code on how to feed it a text file and it creates a new column of corrected text please? This will help massively in my text analysis.
Much appreciated
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
You initialized
SymSpell
withmax_edit_distance_dictionary=0
which means it’s only looking for exact matches.In your
lookup("streama", 2)
,2
is read as theverbosity
argument instead ofmax_edit_distance
, I assume that’s what you were trying to do.For the snippet you provided, the following options work:
Expected output:
Or, choose a smaller
max_edit_distance
than what is defined during object creation:Expected output:
Code for correcting words in a text file:
Input text file
input_words.txt
which contains a misspelled word which is correctable, a properly spelled word, and a misspelled work which is uncorrectable:Expected output
output_words.txt
:Hope that helps.
thanks. learnt quite a lot in this thread. infact, just implemented my spell checker today. cant thank you enough. Please go ahead and close this.
If possible, we can add the following section in the home “README.md” Adding new terms: you can add either in the frequency_dict.txt file or by sym_spell.create_dictionary_entry()
spellchecking in a txt file
This will help people who are just starting with this.