Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

adding new terms and typical workflow

See original GitHub issue

Hi,

I am trying to add new terms by: initial_capacity = 83000 # maximum edit distance per dictionary precalculation max_edit_distance_dictionary = 0 prefix_length = 7

sym_spell = SymSpell(initial_capacity, max_edit_distance_dictionary, prefix_length) if not sym_spell.load_dictionary(dictionary_path, term_index, count_index): print("Dictionary file not found") return

sym_spell.create_dictionary_entry("steama", 4) sym_spell.create_dictionary_entry("steamb", 6) sym_spell.create_dictionary_entry("steamc", 2)

result = sym_spell.lookup("streama", 2) print(result)

I am getting an empty []. What am I missing?

Additionally, could you provide a skeleton code on how to feed it a text file and it creates a new column of corrected text please? This will help massively in my text analysis.

Much appreciated

Issue Analytics

State:
Created 5 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

2reactions

mammothbcommented, Feb 7, 2019

You initialized SymSpell with max_edit_distance_dictionary=0 which means it’s only looking for exact matches.

In your lookup("streama", 2), 2 is read as the verbosity argument instead of max_edit_distance, I assume that’s what you were trying to do.

For the snippet you provided, the following options work:

import os.path
import sys
from symspellpy import SymSpell, Verbosity

initial_capacity = 83000 # maximum edit distance per dictionary precalculation
max_edit_distance_dictionary = 2
prefix_length = 7

sym_spell = SymSpell(initial_capacity, max_edit_distance_dictionary, prefix_length)

dictionary_path = path/to/frequency_dictionary_en_82_765.txt
term_index = 0
count_index = 1
if not sym_spell.load_dictionary(dictionary_path, term_index, count_index):
    print("Dictionary file not found")

sym_spell.create_dictionary_entry("steama", 4)
sym_spell.create_dictionary_entry("steamb", 6)
sym_spell.create_dictionary_entry("steamc", 2)

result = sym_spell.lookup("streama", Verbosity.ALL)
for r in result:
    print(r)

Expected output:

stream, 1, 38592422
streams, 1, 8882706
steama, 1, 4
steam, 2, 11141309
scream, 2, 4310000
streak, 2, 3268695
strata, 2, 1590600
screams, 2, 1286631
streaks, 2, 731843
steamy, 2, 602955
streamed, 2, 505032
streamer, 2, 432893
streaky, 2, 110522
steams, 2, 87057
strega, 2, 55454
steamb, 2, 6
steamc, 2, 2

Or, choose a smaller max_edit_distance than what is defined during object creation:

import os.path
import sys
from symspellpy import SymSpell, Verbosity

initial_capacity = 83000 # maximum edit distance per dictionary precalculation
max_edit_distance_dictionary = 2
prefix_length = 7

sym_spell = SymSpell(initial_capacity, max_edit_distance_dictionary, prefix_length)

dictionary_path = path/to/frequency_dictionary_en_82_765.txt
term_index = 0
count_index = 1
if not sym_spell.load_dictionary(dictionary_path, term_index, count_index):
    print("Dictionary file not found")

sym_spell.create_dictionary_entry("steama", 4)
sym_spell.create_dictionary_entry("steamb", 6)
sym_spell.create_dictionary_entry("steamc", 2)

result = sym_spell.lookup("streama", Verbosity.ALL, max_edit_distance=1)
for r in result:
    print(r)

Expected output:

stream, 1, 38592422
streams, 1, 8882706
steama, 1, 4

Code for correcting words in a text file:

import os.path
import sys
from symspellpy import SymSpell, Verbosity

initial_capacity = 83000 # maximum edit distance per dictionary precalculation
max_edit_distance_dictionary = 2
prefix_length = 7

sym_spell = SymSpell(initial_capacity, max_edit_distance_dictionary, prefix_length)

dictionary_path = path/to/frequency_dictionary_en_82_765.txt
term_index = 0
count_index = 1
if not sym_spell.load_dictionary(dictionary_path, term_index, count_index):
    print("Dictionary file not found")

sym_spell.create_dictionary_entry("steama", 4)
sym_spell.create_dictionary_entry("steamb", 6)
sym_spell.create_dictionary_entry("steamc", 2)

corrected_words = []
cwd = os.path.realpath(os.path.dirname(sys.argv[0]))
with open(os.path.join(cwd, "input_words.txt"), "r") as infile:
    for word in infile:
        word = word.rstrip()
        results = sym_spell.lookup(word, Verbosity.TOP)
        if not results:
            corrected_words.append((word, word))
        else:
            corrected_words.append((word, results[0].term))

with open(os.path.join(cwd, "output_words.txt"), "w") as outfile:
    for (original_word, corrected_word) in corrected_words:
        outfile.write("{} {}\n".format(original_word, corrected_word))

result = sym_spell.lookup("nopossiblereplacement", Verbosity.ALL)
for r in result:
    print(r)

Input text file input_words.txt which contains a misspelled word which is correctable, a properly spelled word, and a misspelled work which is uncorrectable:

streama
steama
nopossiblereplacement

Expected output output_words.txt:

streama stream
steama steama
nopossiblereplacement nopossiblereplacement

Hope that helps.

0reactions

fahadsherycommented, Feb 8, 2019

thanks. learnt quite a lot in this thread. infact, just implemented my spell checker today. cant thank you enough. Please go ahead and close this.

If possible, we can add the following section in the home “README.md” Adding new terms: you can add either in the frequency_dict.txt file or by sym_spell.create_dictionary_entry()

spellchecking in a txt file

This will help people who are just starting with this.