question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Is your feature request related to a problem? Please describe. Sometimes when chatting with a chatbot application, users end up typing gibberish like: asdasd sadasd or fgfsad dasdsa etc. Is there a way to detect gibberish?

Describe the solution you’d like Detect gibberish input using NLP

Describe alternatives you’ve considered https://github.com/rrenaud/Gibberish-Detector (python based)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:13 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
jesus-seijas-spcommented, Dec 28, 2020

Hello, I updated the algorithm. I was using 4 measures to detect gibberish:

  • vowel frequency
  • consonant frequency
  • unique character frequency
  • word over characters frequency I replaced vowel frequency and consonant frequency with vowel over consonant ratio. More changes done:
  • y is considered a vowel
  • Words with less than 6 characters are not considered gibberish
  • The good range for words over characters is now (0.15, 0.3)
  • The good range for unique characters is now (0.4, 1)
  • The good range for vowel over consonant is (0.5, 1.5)

The unit tests:

    test('Should return false for good sentences', () => {
      expect(isGibberish('This sentence is totally valid.')).toBeFalsy();
      expect(isGibberish('This is not gibberish')).toBeFalsy();
      expect(isGibberish('Esta frase es totalmente correcta')).toBeFalsy();
      expect(isGibberish('goodbye')).toBeFalsy();
      expect(isGibberish('sure')).toBeFalsy();
      expect(isGibberish('very much')).toBeFalsy();
    });
    test('Should return true for gibberish sentences', () => {
      expect(isGibberish('zxcvwerjasc')).toBeTruthy();
      expect(isGibberish('ertrjiloifdfyyoiu')).toBeTruthy();
      expect(isGibberish('ajgñsgj ajdskfig jskf')).toBeTruthy();
      expect(
        isGibberish('euzbfdhuifdgiuhdsiudvbdjibgdfijbfdsiuddsfhjibfsdifdhbfd')
      ).toBeTruthy();
      expect(isGibberish('nmnjcviburili,<>')).toBeTruthy();
      expect(isGibberish('ubkddhepwxfzmpc')).toBeTruthy();
      expect(isGibberish('kwinsghocyevlzep')).toBeTruthy();
      expect(isGibberish('ertrjiloifdfyyoiu')).toBeTruthy();
      expect(isGibberish('asddsa adsdsa asdadsasd')).toBeTruthy();
    });
1reaction
jesus-seijas-spcommented, Dec 27, 2020

Hello, I just published a new version that includes gibberish detection:

const { isGibberish } = require('@nlpjs/utils');

console.log(isGibberish('This sentence is totally valid')); // false
console.log(isGibberish('Esta frase es totalmente correcta')); // false
console.log(isGibberish('ertrjiloifdfyyoiu')); // true
console.log(isGibberish('ajgñsgj ajdskfig jskf')); // true

You have also a function to give you the score, if you prefer the probability:

const { gibberishScore } = require('@nlpjs/utils');

console.log(gibberishScore('This sentence is totally valid')); // 0
console.log(gibberishScore('Esta frase es totalmente correcta')); // 0.12
console.log(gibberishScore('ertrjiloifdfyyoiu')); // 0.57
console.log(gibberishScore('ajgñsgj ajdskfig jskf')); // 0.62
Read more comments on GitHub >

github_iconTop Results From Across the Web

domanchi/gibberish-detector: Train a model, and ... - GitHub
detect Uses a trained model to identify gibberish strings. optional arguments: -h, --help show this help message and exit --version Display version information....
Read more >
Gibberish Detector - Design215 Toolbox
While it's easy for humans to see that's gibberish, it turns out to be much more difficult for software to detect this kind...
Read more >
Gibberish Detection with GPT-2 - GitHub Pages
The first level of gibberish detection simply requires a dictionary. The second level requires some kind of language model.
Read more >
gibberish-detector - PyPI
Detects gibberish strings.
Read more >
Gibberish Text detection using Markov Model - Medium
Gibberish detection helps in improving the quality of data. We can filter out sentences that have no meaning.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found