I Built a Talking Santa Claus GPT (with Speech Recognition)

SUPER Fast AI Real Time Speech to Text Transcribtion - Faster Whisper / Python

CrowdStrike IT Outage Explained by a Windows Developer

What Country's Flag Is It? #asmr #satisfying #oddlysatisfying #satisfyingvideo #asmrsounds #aluminum

Disparos en la colectora de la General Paz: ladrón atropelló a los policías que lo quisieron detener

Koupil jsem Nejrychlejší Autíčko na Ovládání za 30 000 Kč!

Using OpenAI Whisper LOCALLY to Recognize "Ok, Google" Keyphrase

Unconventional Coding

zhlédnutí 2 277

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 27. 07. 2024
In today's video I convert my "Ok, GPT" project to use OpenAI Whisper instead of PocketSphinx, and fail to use Mozilla DeepSpeech.
GitHub: github.com/unconv/ok-gpt
Support: buymeacoffee.com/unconv
Consultations: www.buymeacoffee.com/unconv/e...
Memberships: www.buymeacoffee.com/unconv/m...
00:00 Intro & Recap
04:55 Installing Whisper locally
06:48 Trying to install DeepSpeech
09:11 How to record audio with Python
11:49 Transcribing with Whisper
14:27 Detecting speech from volume levels
24:23 Detecting wakeup keyphrase with Whisper
29:46 Comparing transcription with wakeup keyphrase
40:14 Failing to use DeepSpeech
48:59 Converting PocketSphinx to Whisper
56:59 Final test
Věda a technologie

Komentáře • 22

@BurkhardReffeling Před 7 měsíci ⁺¹
Whisper actually takes a system prompt that you can use to steer its style, but it also works pretty reliably to detect phrases it otherwise wouldn't (so you can use that to detect "OK GPT" more reliably).
@unconv Před 7 měsíci
Oh, cool. I guess I should read the docs first 😂
@Canna_Science_and_Technology Před 8 měsíci
Omg! Thanks for putting this together. I spent 5 hrs doing this last month and almost gave up. You just made it look easy…
@unconv Před 8 měsíci
Cool! Thanks :)
@ThaiNeuralNerd Před 8 měsíci
Excellent tutorial!! Please make another by creating a tutorial that builds upon the previous one, start by demonstrating the process of transcribing speech or text using relevant software or tools. Then, show how to translate the transcribed content into different languages, emphasizing the use of efficient translation tools or services. Finally, enhance the tutorial by integrating persona voices generated by Eleven Labs, showcasing how to apply these unique voices to the translated content for a more engaging and personalized experience. This advanced tutorial will combine transcription, translation, and custom voice synthesis to create a multifaceted educational guide.
@user-fv4um9iv2l Před 8 měsíci
Thank you for your efforts in making these kind of videos, very helpful specially to me as student
@unconv Před 8 měsíci
Thanks! Good to hear
@user-cx6sj2zr3r Před 3 měsíci
Hello, thank you very much for all these very useful explanations. One small question: on what type of hardware did you run the demo? I tried on my Raspberry 5 8GB whisper, it's very slow...
@otbot8925 Před 8 měsíci
bad typo with recording ^^. but thanks for the video
@unconv Před 8 měsíci ⁺¹
should have used rust haha
@MedyGames Před 8 měsíci
That's inspiring . I might use somethign like this for node.
To build my own api for transcribing using whisper...
Looking around I found whisper-node ... which should work for the api part
Also node-record-lpcm16 and node vad for voice detection and recording to send files to the api for transcription.
I guess my old raspberry 3 wont do . Finally have a reason to get a new one . Did you test the performance on a raspberry yet ? Im hoping the transcription response is quick for the base models
Finally I would have a free transcription solution locally. Which from the results it seems to transcribe pretty well. I wonder what other useful models are out there. But this already a win
@fuba44 Před 8 měsíci
Loved the video, very informative! This version from the video does not match the git at the moment.
@unconv Před 8 měsíci
Thanks!
@unconv Před 8 měsíci
I made some changes to the code before pushing it to git, but all the functionality should be there
@thenoblerot Před 8 měsíci
You should have 100 times more subscribers. Thank you for another great video. I'm a noob, and really appreciate seeing the unedited coding (and struggles) in real time. How's Whisper performance on the rasp pi 4!?
@unconv Před 8 měsíci
Thank you! I haven't tried it with the Pi yet
@mikebledig7208 Před 7 měsíci
I cloned the git ok-gpt repository. When I tried to run the recognize.py, I am getting the following error:
C:\Users\Edwin\ok-gpt>python recognize.py
Detecting ambient noise...
Listening...
Traceback (most recent call last):
File "C:\Users\Incre\ok-gpt
ecognize.py", line 38, in
if detect_wakeup(message, wakeup_words):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Incre\ok-gpt
ecognize.py", line 20, in detect_wakeup
command = re.sub(r"[,\.!?]", "", command.lower())
^^
NameError: name 're' is not defined
I'm running this on windows 10
Who can help? How about @unconv HELP!
@unconv Před 7 měsíci ⁺¹
Seems like I forgot to import the regex library in the code. You can add to the top of recognize.py "import re" to make it work. I'll fix it in the repo at some point

Další v pořadí

Automatické přehrávání

I Built a Talking Santa Claus GPT (with Speech Recognition)

I Built a Talking Santa Claus GPT (with Speech Recognition)

SUPER Fast AI Real Time Speech to Text Transcribtion - Faster Whisper / Python

SUPER Fast AI Real Time Speech to Text Transcribtion - Faster Whisper / Python

CrowdStrike IT Outage Explained by a Windows Developer

CrowdStrike IT Outage Explained by a Windows Developer

What Country's Flag Is It? #asmr #satisfying #oddlysatisfying #satisfyingvideo #asmrsounds #aluminum

What Country's Flag Is It? #asmr #satisfying #oddlysatisfying #satisfyingvideo #asmrsounds #aluminum

Disparos en la colectora de la General Paz: ladrón atropelló a los policías que lo quisieron detener

Disparos en la colectora de la General Paz: ladrón atropelló a los policías que lo quisieron detener

Koupil jsem Nejrychlejší Autíčko na Ovládání za 30 000 Kč!

Koupil jsem Nejrychlejší Autíčko na Ovládání za 30 000 Kč!

Ráno po jednorázovke

Ráno po jednorázovke

Python + GPT-4o + Flask = AI Meal Calorie Detector (GPT-4 Vision API)

Python + GPT-4o + Flask = AI Meal Calorie Detector (GPT-4 Vision API)

How to Use the Gemini API with Python - Build a Customizable AI Chatbot

How to Use the Gemini API with Python - Build a Customizable AI Chatbot

Have You Picked the Wrong AI Agent Framework?

Have You Picked the Wrong AI Agent Framework?

Modern Python logging

Modern Python logging

I Challenged AI to Create a Better YouTube Channel Than Me... Here's What Happened

I Challenged AI to Create a Better YouTube Channel Than Me... Here's What Happened

The moment we stopped understanding AI [AlexNet]

The moment we stopped understanding AI [AlexNet]

You’ll NEVER Need Prompt Engineering Again with Meta-Prompting

You’ll NEVER Need Prompt Engineering Again with Meta-Prompting

How to Create Agent Swarms With the NEW OpenAI Assistants API

How to Create Agent Swarms With the NEW OpenAI Assistants API

Movie Suggester w/ Embeddings | OpenAI Embeddings Beginner Walkthrough in Python

Movie Suggester w/ Embeddings | OpenAI Embeddings Beginner Walkthrough in Python

Nvidia Has A Very Unique Problem #funfact

Nvidia Has A Very Unique Problem #funfact

HAND CRANK Phone Charger?

HAND CRANK Phone Charger?

Why No One Is Using Windows 11

Why No One Is Using Windows 11

How to Find Circuit Breakers / Fuses

How to Find Circuit Breakers / Fuses

Kopírování klíče do skříňky

Kopírování klíče do skříňky

HW News - Intel P-Core Only CPUs, ASUS Updates, RTX 5090 & Battlemage Rumors

HW News - Intel P-Core Only CPUs, ASUS Updates, RTX 5090 & Battlemage Rumors

World’s smallest 4K headset 😎 #tech #vr #technology #virtualreality #insideout2

World’s smallest 4K headset 😎 #tech #vr #technology #virtualreality #insideout2

#samsung #retrophone #nostalgia #x100

#samsung #retrophone #nostalgia #x100