question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add Levenshtein Distance for dictionaries

See original GitHub issue

It would be nice if the dictionary matching has the ability to use Levenshtein Distance (LD) calculations to match passwords which are non-exact matches to a dictionary entry. For example that something like this would match: {someDictionary: ['Herbert', 'Dorothea']} with the “password” Hebert would match the first entry of the dictionary. The downside is that this would decrease the performance of the library.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
Tostinocommented, May 5, 2021

So I optimized the LD pretty well for the common case in my implementation. For the vast majority of passwords, we don’t even bother getting to that code path. Check all the cases in the DictionaryMatcher where I short circuit the LD code path because the password isn’t a good candidate for it to matter. I only bother doing the LD calculation on the whole password rather than each individual part of the password, cutting down the amount of work required immensely.

But your mention of how this is usually tied directly to a field makes that less of a difference. For my Java library since it’s server side, I generally just get updates from the input box it’s tied to every so often, and then send my estimate back to the client to update the “password strength meter” the calculation is tied to. As a user is typing, I don’t necessarily get an update every keystroke to recalculate things with my web framework, so it doesn’t cause issues with how I use it.

I like the idea of having different code paths, one with very tight timing that will give some reasonably accurate result as the user is typing, and then a slower code path that is able to get things “right” once they have stopped mashing the keyboard.

1reaction
Tostinocommented, May 5, 2021

I’d take a look at what I did for the Java “port” I maintain: https://github.com/GoSimpleLLC/nbvcxz

LD calculations were extremely useful, but my god they slow things down. In nbvcxz I have it as a configuration option if we use an LD pass for our dictionary matching algorithm or not.

Here is the meat of it: https://github.com/GoSimpleLLC/nbvcxz/blob/master/src/main/java/me/gosimple/nbvcxz/matching/DictionaryMatcher.java

Read more comments on GitHub >

github_iconTop Results From Across the Web

Use levenshtein distance for keys in defaultdict in python
I want to look through the dict and see that there is CCCCCC and see that distance('CCCCCC', 'CCCCCT') < 2 so maybe change...
Read more >
Implementing The Levenshtein Distance in Python
The Levenshtein distance is a text similarity measure that compares two words and returns a numeric value representing the distance between them. The...
Read more >
Levenshtein Distance Technique in Dictionary Lookup Methods
2.1 Levenshtein Distance​​ The distance is the number of deletions, insertions, or substitutions required to transform s into t. The greater the Levenshtein...
Read more >
10. Levenshtein Distance | Applications | python-course.eu
To compute the Levenshtein distance in a non-recursive way, we use a matrix containing the Levenshtein distances between all prefixes of the ...
Read more >
Fast and Easy Levenshtein distance using a Trie
This magic is often done using levenshtein distance. In this article, I'll compare two ways of finding the closest matching word in a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found