Real-Time Speech Recognition With Your Microphone [Beginner Tutorial With Full Code]

Sdílet
Vložit
  • čas přidán 16. 07. 2024
  • Build a real-time local speech recognition system that uses your microphone with Python and Jupyter. This will run on your own computer, without the need for a cloud service or a GPU.
    By the end, you'll have a fully working Jupyter notebook that can record microphone audio, transcribe it, and display it. You'll also have ideas for how you can extend it.
    The full code and a project overview are here - github.com/dataquestio/projec... .
    Chapters
    00:00 Project overview
    02:14 Creating Jupyter widgets to start and stop recording
    11:13 Recording from your microphone with pyaudio
    20:08 Recognizing live speech with vosk
    29:51 Project overview and use cases
    ------------------------------
    Join 1M+ Dataquest learners today!
    Master data skills and change your life.
    Sign up for free: bit.ly/3O8MDef

Komentáře • 60

  • @SashaBaych
    @SashaBaych Před 8 měsíci

    Amazing! Thank you so much for this thorough and clear tutorial!

  • @aparnnaperi
    @aparnnaperi Před rokem

    Thank you for this, it worked for me. The explanation was also very clear in the tutorial, keep up the good work.

  • @wachsenmitaktien3593
    @wachsenmitaktien3593 Před 2 lety +2

    sounds really interesting - is this the level of projects you will learn at the dataquest subscription or is the members area more the prequel for what you learn on the YT channel?

    • @Dataquestio
      @Dataquestio  Před 2 lety

      Hi Wachsen - on Dataquest, we have courses that teach you data concepts, as well as projects to help you apply your skills. We have both guided projects, which have more guidance than these projects, and portfolio projects, which are similar to the CZcams projects (with some added instructions, etc).
      So Dataquest both helps you learn all of the data skills, and has projects to pull it all together.

  • @michaelcamangeg1199
    @michaelcamangeg1199 Před rokem

    Thank you!

  • @aparnnaperi
    @aparnnaperi Před rokem

    Hi ,
    As it is mentioned in the video, the output does take a long time to get to the screen. Is there a tutorial on how to use the recasepunc model directly in the same notebook? Any help would be greatly appreciated. Thanks.

  • @anybcd
    @anybcd Před 2 lety +3

    Thanks for this, i never knew widgets can be created in jupyternotebook. Thanks for this

    • @PressF5
      @PressF5 Před rokem

      cab someone help me everything works fine but the widgets that 2 buttons are not displaying in my jupytr lab record and stop button

  • @shyjukoppayilthiruvoth6568

    great tutorial

  • @hopelesssuprem1867
    @hopelesssuprem1867 Před 2 lety +3

    Thank u so much for this tutorial. Please create the same with vosk about speaker identification

    • @Dataquestio
      @Dataquestio  Před 2 lety +2

      Thanks for the idea! -Vik

    • @hopelesssuprem1867
      @hopelesssuprem1867 Před 2 lety

      @@Dataquestio thank's for your answer. I will watch this with a great pleasure)

  • @meditationandrelaxationmus7158

    Thank you it was a great tutorial. Can you please create the same with vosk about Speaker identification.

  • @user-jb4kt6vu9s
    @user-jb4kt6vu9s Před 7 měsíci

    Hi can you tell me how to run more than two language models at the same time... A video tutorial for the same would be a great help...

  • @PressF5
    @PressF5 Před rokem

    cab someone help me everything works fine but the widgets that 2 buttons are not displaying in my jupytr lab record and stop button

  • @caseykauf6615
    @caseykauf6615 Před rokem +2

    Can you use pycharm instead of jupyter?

  • @olivercarmignani9082
    @olivercarmignani9082 Před rokem +1

    Really nice explanation video! I tried to use id int visual studio code in a while loop, but i don't have success. Which changes have to be applied?

    • @PressF5
      @PressF5 Před rokem

      cab someone help me everything works fine but the widgets that 2 buttons are not displaying in my jupytr lab record and stop button

  • @hssp1534
    @hssp1534 Před rokem

    how to load the model if I have already downloaded it since after I entered the model name and ran the bloc of code it started downloading the model separately. Please advise how to load model if it's already downloaded

  • @sj_life_and_science
    @sj_life_and_science Před 10 měsíci

    How would you modify this to work as a web app? Or on a similar client side like a bot that joins a zoom calls? Or a browser plug-in that you can turn on and off and transcribe live?
    Curious as I want to implement something like this.

    • @sj_life_and_science
      @sj_life_and_science Před 10 měsíci

      I know you said you can’t run on the cloud. But what if you create an endpoint that receives a boolean via a button or programmatic call that then launches this code to start transcribing by accessing the local microphone from the cloud? Is this kind of thing possible?

  • @hautboisjc
    @hautboisjc Před rokem

    Hi Vik, thanks for doing this. However, it doesn't work for me :(
    When i hit the record button, it doesn't transcribe whatever I say. Instead, it shows "WARNING: reverting to cpu as cuda is not available"

    • @Dataquestio
      @Dataquestio  Před rokem

      Hey there - it's hard to diagnose the issue remotely, but I wouldn't worry about the warning. CPU inference can work with vosk. The most likely issues are that there's no function connected to the on_click event for the record button, or the thread hasn't started for recording. Adding print statements in the code can help you find which pieces are working/aren't.

  • @SIR_Studios786
    @SIR_Studios786 Před 2 měsíci

    where is the model downloaded ?

  • @user-sg9kf8tu3d
    @user-sg9kf8tu3d Před 3 měsíci

    sir, i am getting error that subprocess returned non-zero exit status 1. please help me solve the problem. I need to show this project in 2 days in my college

  • @REALVIBESTV
    @REALVIBESTV Před rokem

    I need something like this that can work in Unreal Engine 5.1

  • @franekpodlach7217
    @franekpodlach7217 Před rokem

    Nice

  • @moses5407
    @moses5407 Před rokem

    ON device translation, even with a delay, would be great,too!

    • @Dataquestio
      @Dataquestio  Před rokem

      You can actually do this - github.com/mozilla/translate .

  • @quinman16
    @quinman16 Před rokem

    🙏does this work with CircuitPython?🙏

  • @pfuhad3760
    @pfuhad3760 Před rokem

    Is Vosk the best speech to text opensource library . If there are others with better accuracy without using GPU , can you please tell me .

    • @Dataquestio
      @Dataquestio  Před rokem

      As of now, I would use whisper instead of vosk. There is a version of whisper that runs on CPU.

  • @pylou7064
    @pylou7064 Před rokem

    Hey nice video ^^, So, it's not real time right? Is 1 second delay work ? And i have an other question is this possible to know other information, like the time the word is pronounce and when ? Thanks .

    • @Dataquestio
      @Dataquestio  Před rokem +1

      Yes, there is a short delay to process. I'll probably make another video at some point showing how to do this without vosk (so you can get more info, and do true real time).

    • @narayanasaicharan2217
      @narayanasaicharan2217 Před rokem

      @@Dataquestio Eagerly waiting for the video😋

    • @sertacince6571
      @sertacince6571 Před rokem

      @@Dataquestio As far as I can see, the video has not been published and I need it urgently. Could you explain roughly how to do it from here?

  • @rohitdarshan_dtu3520
    @rohitdarshan_dtu3520 Před rokem +1

    sir, I am getting an issue with the last lines of code if you could clear my issue will be very happy possibly please see my concern as soon as possible
    model = Model(model_name="vosk-model-en-us-0.22")

    • @nishaldevadiga6766
      @nishaldevadiga6766 Před 9 měsíci +2

      same
      Edit: I found the solution
      Your model is getting stored in the .cache folder which you can find in the user folder of C:\ Users\..\.cache\vosk\
      Delete all the vosk model (litrally delete all folders/files inside that) folders u have
      Then run that segment of program
      Hope it helped you!

  • @ohassairi
    @ohassairi Před 5 měsíci

    can you add translation offline ?

  • @alexalbert3026
    @alexalbert3026 Před 5 dny

    Following from the first line... I got...Error displaying widget

  • @ohassairi
    @ohassairi Před 5 měsíci

    i tried it. it works but i got some words lost !! can't find why

  • @hssp1534
    @hssp1534 Před rokem

    I ran your code provided in the link but it gives "OSError: [Errno -9998] Invalid number of channels"..how to resolve it. Please advise for solution

    • @dredmaster9343
      @dredmaster9343 Před 8 měsíci

      i am getting the same error, have you resolved your error? if yes then do tell me

    • @hssp1534
      @hssp1534 Před 8 měsíci

      @@dredmaster9343 Nope not yet. I left trying. I havent revisited the code since a long time

  • @sharliduravlog2
    @sharliduravlog2 Před 2 měsíci

    is it offline or online?

  • @stephenyipck
    @stephenyipck Před rokem

    This is great but what if I want to use this code in a .py file?

    • @Dataquestio
      @Dataquestio  Před rokem

      Hi Stephen - you can write the same code in a .py file. JupyterLab also allows you to export notebooks as .py files that you can run.

    • @kilovolt2494
      @kilovolt2494 Před rokem

      @@Dataquestio Actually, that was part of my question. I got rid of widgets (for now) and made a .py file. It works perfectly fine, except for one little detail: when the main function stops, it doesn't terminate the script, so it hangs forever after printing "Stopped." I already tried joining threads and even explicitly saying 'quit' at the end, it still hangs. What can be the reason of that? Is that caused by vosk?

  • @NickolayShmyrev
    @NickolayShmyrev Před rokem

    This tutorial is wrong, Vosk do not recommend using pyaudio due to latency issues. Our demos use sounddevice.

    • @Dataquestio
      @Dataquestio  Před rokem +2

      I wouldn't call the tutorial wrong. It works fine, and latency was not an issue from what I could tell. Both sounddevice and pyaudio are wrappers over portaudio, so they shouldn't function extremely differently (aside from the Python API being different).
      I couldn't find any references to sounddevice on the vosk documentation site - if that is indeed the recommended way to use vosk, I would advertise that fact somewhere.

  • @SuperShank76
    @SuperShank76 Před rokem +1

    I took the trouble to watch the entire video but when I hit "Start Recording", there is no transcribing happening. There is no error either. Thumbs down.

  • @SARMADALHAFIDH
    @SARMADALHAFIDH Před rokem

    Is it free or must pay, please?

  • @ivorpratap1479
    @ivorpratap1479 Před rokem

    Is anybody Help, Says attribute error FYI
    p = pyaudio.Pyaudio() ,
    ----> 3 p = pyaudio.Pyaudio()
    4 for i in range(p.get_device_count()):
    5 print(p.get_device_info_index(i))
    AttributeError: module 'pyaudio' has no attribute 'Pyaudio'

    • @JonzieBoy
      @JonzieBoy Před 4 měsíci

      You have to be very careful with capitalization, Pyaudio is not the same as PyAudio

  • @doritos7372
    @doritos7372 Před rokem

    OSError: [Errno -9998] Invalid number of channels .
    i was getting this

    • @hssp1534
      @hssp1534 Před rokem

      I rectified it after i used the right sounds device. I was using my cell phone mic earlier but then switched to a headset and the error never appeared again