question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

adding new terms and typical workflow

See original GitHub issue

Hi,

I am trying to add new terms by: initial_capacity = 83000 # maximum edit distance per dictionary precalculation max_edit_distance_dictionary = 0 prefix_length = 7

sym_spell = SymSpell(initial_capacity, max_edit_distance_dictionary, prefix_length) if not sym_spell.load_dictionary(dictionary_path, term_index, count_index): print("Dictionary file not found") return

sym_spell.create_dictionary_entry("steama", 4) sym_spell.create_dictionary_entry("steamb", 6) sym_spell.create_dictionary_entry("steamc", 2)

result = sym_spell.lookup("streama", 2) print(result)

I am getting an empty []. What am I missing?

Additionally, could you provide a skeleton code on how to feed it a text file and it creates a new column of corrected text please? This will help massively in my text analysis.

Much appreciated

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
mammothbcommented, Feb 7, 2019

You initialized SymSpell with max_edit_distance_dictionary=0 which means it’s only looking for exact matches.

In your lookup("streama", 2), 2 is read as the verbosity argument instead of max_edit_distance, I assume that’s what you were trying to do.

For the snippet you provided, the following options work:

import os.path
import sys
from symspellpy import SymSpell, Verbosity

initial_capacity = 83000 # maximum edit distance per dictionary precalculation
max_edit_distance_dictionary = 2
prefix_length = 7

sym_spell = SymSpell(initial_capacity, max_edit_distance_dictionary, prefix_length)

dictionary_path = path/to/frequency_dictionary_en_82_765.txt
term_index = 0
count_index = 1
if not sym_spell.load_dictionary(dictionary_path, term_index, count_index):
    print("Dictionary file not found")

sym_spell.create_dictionary_entry("steama", 4)
sym_spell.create_dictionary_entry("steamb", 6)
sym_spell.create_dictionary_entry("steamc", 2)

result = sym_spell.lookup("streama", Verbosity.ALL)
for r in result:
    print(r)

Expected output:

stream, 1, 38592422
streams, 1, 8882706
steama, 1, 4
steam, 2, 11141309
scream, 2, 4310000
streak, 2, 3268695
strata, 2, 1590600
screams, 2, 1286631
streaks, 2, 731843
steamy, 2, 602955
streamed, 2, 505032
streamer, 2, 432893
streaky, 2, 110522
steams, 2, 87057
strega, 2, 55454
steamb, 2, 6
steamc, 2, 2

Or, choose a smaller max_edit_distance than what is defined during object creation:

import os.path
import sys
from symspellpy import SymSpell, Verbosity

initial_capacity = 83000 # maximum edit distance per dictionary precalculation
max_edit_distance_dictionary = 2
prefix_length = 7

sym_spell = SymSpell(initial_capacity, max_edit_distance_dictionary, prefix_length)

dictionary_path = path/to/frequency_dictionary_en_82_765.txt
term_index = 0
count_index = 1
if not sym_spell.load_dictionary(dictionary_path, term_index, count_index):
    print("Dictionary file not found")

sym_spell.create_dictionary_entry("steama", 4)
sym_spell.create_dictionary_entry("steamb", 6)
sym_spell.create_dictionary_entry("steamc", 2)

result = sym_spell.lookup("streama", Verbosity.ALL, max_edit_distance=1)
for r in result:
    print(r)

Expected output:

stream, 1, 38592422
streams, 1, 8882706
steama, 1, 4

Code for correcting words in a text file:

import os.path
import sys
from symspellpy import SymSpell, Verbosity

initial_capacity = 83000 # maximum edit distance per dictionary precalculation
max_edit_distance_dictionary = 2
prefix_length = 7

sym_spell = SymSpell(initial_capacity, max_edit_distance_dictionary, prefix_length)

dictionary_path = path/to/frequency_dictionary_en_82_765.txt
term_index = 0
count_index = 1
if not sym_spell.load_dictionary(dictionary_path, term_index, count_index):
    print("Dictionary file not found")

sym_spell.create_dictionary_entry("steama", 4)
sym_spell.create_dictionary_entry("steamb", 6)
sym_spell.create_dictionary_entry("steamc", 2)

corrected_words = []
cwd = os.path.realpath(os.path.dirname(sys.argv[0]))
with open(os.path.join(cwd, "input_words.txt"), "r") as infile:
    for word in infile:
        word = word.rstrip()
        results = sym_spell.lookup(word, Verbosity.TOP)
        if not results:
            corrected_words.append((word, word))
        else:
            corrected_words.append((word, results[0].term))

with open(os.path.join(cwd, "output_words.txt"), "w") as outfile:
    for (original_word, corrected_word) in corrected_words:
        outfile.write("{} {}\n".format(original_word, corrected_word))

result = sym_spell.lookup("nopossiblereplacement", Verbosity.ALL)
for r in result:
    print(r)

Input text file input_words.txt which contains a misspelled word which is correctable, a properly spelled word, and a misspelled work which is uncorrectable:

streama
steama
nopossiblereplacement

Expected output output_words.txt:

streama stream
steama steama
nopossiblereplacement nopossiblereplacement

Hope that helps.

0reactions
fahadsherycommented, Feb 8, 2019

thanks. learnt quite a lot in this thread. infact, just implemented my spell checker today. cant thank you enough. Please go ahead and close this.

If possible, we can add the following section in the home “README.md” Adding new terms: you can add either in the frequency_dict.txt file or by sym_spell.create_dictionary_entry()

spellchecking in a txt file

This will help people who are just starting with this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Create Workflows from Workflow Templates
Choose the New Workflow from Template action. The Workflow Templates page opens. Select a workflow template, then choose OK.
Read more >
Workflow Definitions & Templates for 2022 | monday.com Blog
Learn exactly how your teams can benefit from a workflow, and how to optimize them throughout your company to boost productivity.
Read more >
Accelerate the Creation of Contract Workflows Between Sales ...
You can add predefined, or standardized workflow templates to your CLM platform for each of the four steps in a typical sales contract...
Read more >
7 Steps To A Fast, Efficient Contract Workflow Process In 2023
A faster and more efficient contract workflow should mean more deals closed and more people hired. How can you accelerate the contract workflow...
Read more >
Workflow Rule Examples - Salesforce Help
Follow Up Before Contract Expires · Follow Up when Platinum Contract Case Closes · Assign Credit Check for New Customer · Notify Account...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found