Using OpenAI Whisper LOCALLY to Recognize "Ok, Google" Keyphrase

Sdílet
Vložit
  • čas přidán 27. 07. 2024
  • In today's video I convert my "Ok, GPT" project to use OpenAI Whisper instead of PocketSphinx, and fail to use Mozilla DeepSpeech.
    GitHub: github.com/unconv/ok-gpt
    Support: buymeacoffee.com/unconv
    Consultations: www.buymeacoffee.com/unconv/e...
    Memberships: www.buymeacoffee.com/unconv/m...
    00:00 Intro & Recap
    04:55 Installing Whisper locally
    06:48 Trying to install DeepSpeech
    09:11 How to record audio with Python
    11:49 Transcribing with Whisper
    14:27 Detecting speech from volume levels
    24:23 Detecting wakeup keyphrase with Whisper
    29:46 Comparing transcription with wakeup keyphrase
    40:14 Failing to use DeepSpeech
    48:59 Converting PocketSphinx to Whisper
    56:59 Final test
  • Věda a technologie

Komentáře • 22

  • @BurkhardReffeling
    @BurkhardReffeling Před 7 měsíci +1

    Whisper actually takes a system prompt that you can use to steer its style, but it also works pretty reliably to detect phrases it otherwise wouldn't (so you can use that to detect "OK GPT" more reliably).

    • @unconv
      @unconv  Před 7 měsíci

      Oh, cool. I guess I should read the docs first 😂

  • @Canna_Science_and_Technology

    Omg! Thanks for putting this together. I spent 5 hrs doing this last month and almost gave up. You just made it look easy…

    • @unconv
      @unconv  Před 8 měsíci

      Cool! Thanks :)

  • @ThaiNeuralNerd
    @ThaiNeuralNerd Před 8 měsíci

    Excellent tutorial!! Please make another by creating a tutorial that builds upon the previous one, start by demonstrating the process of transcribing speech or text using relevant software or tools. Then, show how to translate the transcribed content into different languages, emphasizing the use of efficient translation tools or services. Finally, enhance the tutorial by integrating persona voices generated by Eleven Labs, showcasing how to apply these unique voices to the translated content for a more engaging and personalized experience. This advanced tutorial will combine transcription, translation, and custom voice synthesis to create a multifaceted educational guide.

  • @user-fv4um9iv2l
    @user-fv4um9iv2l Před 8 měsíci

    Thank you for your efforts in making these kind of videos, very helpful specially to me as student

    • @unconv
      @unconv  Před 8 měsíci

      Thanks! Good to hear

  • @user-cx6sj2zr3r
    @user-cx6sj2zr3r Před 3 měsíci

    Hello, thank you very much for all these very useful explanations. One small question: on what type of hardware did you run the demo? I tried on my Raspberry 5 8GB whisper, it's very slow...

  • @otbot8925
    @otbot8925 Před 8 měsíci

    bad typo with recording ^^. but thanks for the video

    • @unconv
      @unconv  Před 8 měsíci +1

      should have used rust haha

  • @MedyGames
    @MedyGames Před 8 měsíci

    That's inspiring . I might use somethign like this for node.
    To build my own api for transcribing using whisper...
    Looking around I found whisper-node ... which should work for the api part
    Also node-record-lpcm16 and node vad for voice detection and recording to send files to the api for transcription.
    I guess my old raspberry 3 wont do . Finally have a reason to get a new one . Did you test the performance on a raspberry yet ? Im hoping the transcription response is quick for the base models
    Finally I would have a free transcription solution locally. Which from the results it seems to transcribe pretty well. I wonder what other useful models are out there. But this already a win

  • @fuba44
    @fuba44 Před 8 měsíci

    Loved the video, very informative! This version from the video does not match the git at the moment.

    • @unconv
      @unconv  Před 8 měsíci

      Thanks!

    • @unconv
      @unconv  Před 8 měsíci

      I made some changes to the code before pushing it to git, but all the functionality should be there

  • @thenoblerot
    @thenoblerot Před 8 měsíci

    You should have 100 times more subscribers. Thank you for another great video. I'm a noob, and really appreciate seeing the unedited coding (and struggles) in real time. How's Whisper performance on the rasp pi 4!?

    • @unconv
      @unconv  Před 8 měsíci

      Thank you! I haven't tried it with the Pi yet

  • @mikebledig7208
    @mikebledig7208 Před 7 měsíci

    I cloned the git ok-gpt repository. When I tried to run the recognize.py, I am getting the following error:
    C:\Users\Edwin\ok-gpt>python recognize.py
    Detecting ambient noise...
    Listening...
    Traceback (most recent call last):
    File "C:\Users\Incre\ok-gpt
    ecognize.py", line 38, in
    if detect_wakeup(message, wakeup_words):
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "C:\Users\Incre\ok-gpt
    ecognize.py", line 20, in detect_wakeup
    command = re.sub(r"[,\.!?]", "", command.lower())
    ^^
    NameError: name 're' is not defined
    I'm running this on windows 10
    Who can help? How about @unconv HELP!

    • @unconv
      @unconv  Před 7 měsíci +1

      Seems like I forgot to import the regex library in the code. You can add to the top of recognize.py "import re" to make it work. I'll fix it in the repo at some point