question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hallucinations and VAD [BLANK_AUDIO] Generations

See original GitHub issue

Tested with both small and tiny model sizes.

Using the Streaming example with VAD turned on etc. I’ve tried different settings and tried using a prompt to try and eliminate hallucinations and sound effects but to no avail or getting VAD to properly work I might be missing something because it treats the hallucinations of sounds like words so it struggles to turn on AD. Examples of outputs are below:

When I’m not talking and the background noise is low the following gets transcribed. Ideally, it would run inference in the background and only detect incoming audio from me talking, etc. [BLANK_AUDIO] [BLANK_AUDIO] [BLANK_AUDIO]

Most of the time with the tiny model, it loves to hallucinate sound effects from no audio or low background noises. (wind blowing), (clicking), (barking)

Are there any settings that I can try that would help eliminate hallucinations from no audio or static or get VAD correctly working?

Great project, excited for any future features or updates.

Issue Analytics

  • State:open
  • Created 2 months ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
Macoroncommented, Jul 31, 2023

Another thing I noticed is that the mic has to be pretty good for VAD to work. When I used my webcam mic the VAD struggled to stop the recording but when I used a jack mic near my face it worked relatively well within ~1 sec. after I was done talking.

The VAD implementation is very basic. Original author of whisper.cpp recommend to use something more robust, like @BaMarcy suggested. But it is other extra dependency, which is out of this project reach.

Still you can try to play with Vad Thd and Vad Freq Thd parameters.

1reaction
BaMarcycommented, Jul 30, 2023

I can recommend Silero VAD model which has ONNX version and that’s the state of the art and open source BTW

https://github.com/snakers4/silero-vad/tree/master/examples/cpp

Read more comments on GitHub >

github_iconTop Results From Across the Web

Hallucinations: Etiology and clinical implications - PMC
According to Kolb and Brodie (1982), hallucinations represent a breakthrough of preconscious or unconscious material into consciousness in response to certain ...
Read more >
Visual hallucinations in neurological and ophthalmological ...
Visual hallucinations are common in older people and are especially associated with ... VaD, vascular dementia; VH, visual hallucinations.
Read more >
Hallucinations: Definition, Causes, Treatment & Types
A hallucination is a false perception of objects or events involving your senses: sight, sound, smell, touch and taste.
Read more >
Hallucinations and dementia
Visual hallucinations are usually caused by damage to the brain. They are more common in people with dementia with Lewy bodies and Parkinson's...
Read more >
Hallucinations and hearing voices
Hallucinations refer to the experience of hearing, seeing or smelling things that ... Simple visual hallucinations may include flashes or geometric shapes.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found