Building My Own Alexa / Google Home: Detecting the Wake-up Keyword
Vložit
- čas přidán 27. 07. 2024
- In today's video I start a project where I create my own Google Home / Alexa style device that can be voice controlled. In this video I implement the initial wakeup keyword, or "OK, GPT" detection
GitHub: github.com/unconv/ok-gpt
Support: buymeacoffee.com/unconv
Consultations: www.buymeacoffee.com/unconv/e...
Memberships: www.buymeacoffee.com/unconv/m...
00:00 Intro
01:10 Trying PocketSphinx keyphrase detection
04:19 Getting past PocketSphinx recognition errors
22:49 Detecting multiple keyphrases with multiprocessing
33:58 Separating initialization and recognition
37:22 Using Queues to perform tasks when keyphrase is detected
44:55 It works! - Věda a technologie
I have looked into creating a centralized hub for any bot providing options for real time listening.. or voice activation.. and text to speech/speech to text where you can name each bot and call on them one at a time or with Hierarchy response real time conversation with all of them or even essentially let them talk amongst themselves... I was considering android studio... the only setback is I have only self taught understanding of how code works and no experience of actually coding... but through the tumble method of feeding code back and forth from bot to bot they can polish and clean up code quite well and for a different project with gpt4 and memgpt i have successfully created an xml page... internally I know how computers operate but I don't have much implemented experience..but this will be a fun video to watch
One thing that has me thinking, when you record your voice using just a microphone on to magnetic tape, or now days, some form of digital storage, what you say gets recorded exactly as you said it. However, computers don't seem to be able to pick out exactly what you said. Pocketsphinx gets it so wrong. Of course, there are things like google speech recognition that do a pretty good job now days. But it's just amazing how a simple microphone used to record your words, doesn't get it wrong, yet a computer gets it wrong.
imagine leaving a voice msg for someone where you said: "See you later" and the microphone got it as, See you never". Bleh...
great video please continue on with the project. I love the way you allow errors to happen and then you solve them as you go in realtime
Thanks! I will continue (2nd video is out already!)
cool, but this went totally over my head haha
That Sphinx seems a good way to make scientific experiments with elevated stress levels. 🧔 What speech recognizion CZcams uses to generate subtitles?
🕳️👀 It is their secret it seems. That was interesting experiment with local 'Ok Google'. Maybe OpenAI Whisper would give better results than Sphinx.
Yeah, I thought about using Whisper but I don't want to send all the audio to OpenAI. Although there might be a local version of Whisper, but it seemed like a hassle to get working, especially for a Raspberry Pi
@@unconv If you have a newer model and a 64bit os, you could try out whisper.cpp with the tiny model, which seems to work with the Pi (at least model 4). Whisper is really great and much better than any other solution out there.
Thanks! I'll try out Whisper locally. I was just intimidated by the AI-ness of it haha