Found why best_match has low performance with Levenshtein distance comparision
See original GitHub issue@gunthercox, @vkosuri, @mymusise
See you guys concern the performance issue for statement get response, what I did for performance improvement may be helpful. The performance improves from 1.9s to 96.8ms by following changes to LevenshteinDistance:
move
import sys
from difflib import SequenceMatcher
to the front of the class.
comment out try … exception … block for library import.
# import sys
#
# # Use python-Levenshtein if available
# try:
# from Levenshtein.StringMatcher import StringMatcher as SequenceMatcher
# except ImportError:
# from difflib import SequenceMatcher
# PYTHON = sys.version_info[0]
# Return 0 if either statement has a falsy text value
# if not statement.text or not other_statement.text:
# return 0
#
# # Get the lowercase version of both strings
# if PYTHON < 3:
# statement_text = unicode(statement.text.lower()) # NOQA
# other_statement_text = unicode(other_statement.text.lower()) # NOQA
# else:
# statement_text = str(statement.text.lower())
# other_statement_text = str(other_statement.text.lower())
statement_text = str(statement.text.lower())
other_statement_text = str(other_statement.text.lower())
Good luck!
Issue Analytics
- State:
- Created 6 years ago
- Comments:8
Top Results From Across the Web
algorithm - Most efficient way to calculate Levenshtein distance
After profiling my code, I found out that the overwhelming majority of time is spent calculating the distance between the query and the...
Read more >Levenshtein distance for NLP machine learning named entities
This is the lowest scoring string comparison in the example, as we have made sure we have chosen the best matches possible from...
Read more >3GOLD: optimized Levenshtein distance for clustering third ...
The lowest edit distance value between the comparisons is used in clustering threshold analysis. An example of this improvement is shown in ...
Read more >stringdist: Approximate String Matching, Fuzzy Text Search ...
If match was found, element (i, j) contains the match, otherwise it is set to NA. Running cosine distance. This algorithm gains efficiency...
Read more >levenshtein - Manual - PHP
The Levenshtein distance is defined as the minimal number of characters ... I found that lowercasing the array prior to comparing yields a ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yeah
Levenshtein.StringMatcher.StringMatcher
anddifflib.SequenceMatcher
are both different libraries. Maybe faster but I think this try/except is because ChatterBot support both Python 2.7 and 3@vkosuri I’ll create PR after fully test