Using OpenAI Whisper LOCALLY to Recognize "Ok, Google" Keyphrase
Vložit
- čas přidán 27. 07. 2024
- In today's video I convert my "Ok, GPT" project to use OpenAI Whisper instead of PocketSphinx, and fail to use Mozilla DeepSpeech.
GitHub: github.com/unconv/ok-gpt
Support: buymeacoffee.com/unconv
Consultations: www.buymeacoffee.com/unconv/e...
Memberships: www.buymeacoffee.com/unconv/m...
00:00 Intro & Recap
04:55 Installing Whisper locally
06:48 Trying to install DeepSpeech
09:11 How to record audio with Python
11:49 Transcribing with Whisper
14:27 Detecting speech from volume levels
24:23 Detecting wakeup keyphrase with Whisper
29:46 Comparing transcription with wakeup keyphrase
40:14 Failing to use DeepSpeech
48:59 Converting PocketSphinx to Whisper
56:59 Final test - Věda a technologie
Whisper actually takes a system prompt that you can use to steer its style, but it also works pretty reliably to detect phrases it otherwise wouldn't (so you can use that to detect "OK GPT" more reliably).
Oh, cool. I guess I should read the docs first 😂
Omg! Thanks for putting this together. I spent 5 hrs doing this last month and almost gave up. You just made it look easy…
Cool! Thanks :)
Excellent tutorial!! Please make another by creating a tutorial that builds upon the previous one, start by demonstrating the process of transcribing speech or text using relevant software or tools. Then, show how to translate the transcribed content into different languages, emphasizing the use of efficient translation tools or services. Finally, enhance the tutorial by integrating persona voices generated by Eleven Labs, showcasing how to apply these unique voices to the translated content for a more engaging and personalized experience. This advanced tutorial will combine transcription, translation, and custom voice synthesis to create a multifaceted educational guide.
Thank you for your efforts in making these kind of videos, very helpful specially to me as student
Thanks! Good to hear
Hello, thank you very much for all these very useful explanations. One small question: on what type of hardware did you run the demo? I tried on my Raspberry 5 8GB whisper, it's very slow...
bad typo with recording ^^. but thanks for the video
should have used rust haha
That's inspiring . I might use somethign like this for node.
To build my own api for transcribing using whisper...
Looking around I found whisper-node ... which should work for the api part
Also node-record-lpcm16 and node vad for voice detection and recording to send files to the api for transcription.
I guess my old raspberry 3 wont do . Finally have a reason to get a new one . Did you test the performance on a raspberry yet ? Im hoping the transcription response is quick for the base models
Finally I would have a free transcription solution locally. Which from the results it seems to transcribe pretty well. I wonder what other useful models are out there. But this already a win
Loved the video, very informative! This version from the video does not match the git at the moment.
Thanks!
I made some changes to the code before pushing it to git, but all the functionality should be there
You should have 100 times more subscribers. Thank you for another great video. I'm a noob, and really appreciate seeing the unedited coding (and struggles) in real time. How's Whisper performance on the rasp pi 4!?
Thank you! I haven't tried it with the Pi yet
I cloned the git ok-gpt repository. When I tried to run the recognize.py, I am getting the following error:
C:\Users\Edwin\ok-gpt>python recognize.py
Detecting ambient noise...
Listening...
Traceback (most recent call last):
File "C:\Users\Incre\ok-gpt
ecognize.py", line 38, in
if detect_wakeup(message, wakeup_words):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Incre\ok-gpt
ecognize.py", line 20, in detect_wakeup
command = re.sub(r"[,\.!?]", "", command.lower())
^^
NameError: name 're' is not defined
I'm running this on windows 10
Who can help? How about @unconv HELP!
Seems like I forgot to import the regex library in the code. You can add to the top of recognize.py "import re" to make it work. I'll fix it in the repo at some point