Real-Time Speech Recognition With Your Microphone [Beginner Tutorial With Full Code]

Dataquest

zhlédnutí 52 506

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 16. 07. 2024
Build a real-time local speech recognition system that uses your microphone with Python and Jupyter. This will run on your own computer, without the need for a cloud service or a GPU.
By the end, you'll have a fully working Jupyter notebook that can record microphone audio, transcribe it, and display it. You'll also have ideas for how you can extend it.
The full code and a project overview are here - github.com/dataquestio/projec... .
Chapters
00:00 Project overview
02:14 Creating Jupyter widgets to start and stop recording
11:13 Recording from your microphone with pyaudio
20:08 Recognizing live speech with vosk
29:51 Project overview and use cases
------------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: bit.ly/3O8MDef

Komentáře • 60

@SashaBaych Před 8 měsíci
Amazing! Thank you so much for this thorough and clear tutorial!
@benyusu8045 Před 8 měsíci
in thread Thread probem
@aparnnaperi Před rokem
Thank you for this, it worked for me. The explanation was also very clear in the tutorial, keep up the good work.
@wachsenmitaktien3593 Před 2 lety ⁺²
sounds really interesting - is this the level of projects you will learn at the dataquest subscription or is the members area more the prequel for what you learn on the YT channel?
@Dataquestio Před 2 lety
Hi Wachsen - on Dataquest, we have courses that teach you data concepts, as well as projects to help you apply your skills. We have both guided projects, which have more guidance than these projects, and portfolio projects, which are similar to the CZcams projects (with some added instructions, etc).
So Dataquest both helps you learn all of the data skills, and has projects to pull it all together.
@michaelcamangeg1199 Před rokem
Thank you!
@aparnnaperi Před rokem
Hi ,
As it is mentioned in the video, the output does take a long time to get to the screen. Is there a tutorial on how to use the recasepunc model directly in the same notebook? Any help would be greatly appreciated. Thanks.
@anybcd Před 2 lety ⁺³
Thanks for this, i never knew widgets can be created in jupyternotebook. Thanks for this
@PressF5 Před rokem
cab someone help me everything works fine but the widgets that 2 buttons are not displaying in my jupytr lab record and stop button
@shyjukoppayilthiruvoth6568 Před rokem
great tutorial
@hopelesssuprem1867 Před 2 lety ⁺³
Thank u so much for this tutorial. Please create the same with vosk about speaker identification
@Dataquestio Před 2 lety ⁺²
Thanks for the idea! -Vik
@hopelesssuprem1867 Před 2 lety
@@Dataquestio thank's for your answer. I will watch this with a great pleasure)
@meditationandrelaxationmus7158 Před 9 měsíci
Thank you it was a great tutorial. Can you please create the same with vosk about Speaker identification.
@user-jb4kt6vu9s Před 7 měsíci
Hi can you tell me how to run more than two language models at the same time... A video tutorial for the same would be a great help...
@PressF5 Před rokem
cab someone help me everything works fine but the widgets that 2 buttons are not displaying in my jupytr lab record and stop button
@caseykauf6615 Před rokem ⁺²
Can you use pycharm instead of jupyter?
@olivercarmignani9082 Před rokem ⁺¹
Really nice explanation video! I tried to use id int visual studio code in a while loop, but i don't have success. Which changes have to be applied?
@PressF5 Před rokem
cab someone help me everything works fine but the widgets that 2 buttons are not displaying in my jupytr lab record and stop button
@hssp1534 Před rokem
how to load the model if I have already downloaded it since after I entered the model name and ran the bloc of code it started downloading the model separately. Please advise how to load model if it's already downloaded
@sj_life_and_science Před 10 měsíci
How would you modify this to work as a web app? Or on a similar client side like a bot that joins a zoom calls? Or a browser plug-in that you can turn on and off and transcribe live?
Curious as I want to implement something like this.
@sj_life_and_science Před 10 měsíci
I know you said you can’t run on the cloud. But what if you create an endpoint that receives a boolean via a button or programmatic call that then launches this code to start transcribing by accessing the local microphone from the cloud? Is this kind of thing possible?
@hautboisjc Před rokem
Hi Vik, thanks for doing this. However, it doesn't work for me :(
When i hit the record button, it doesn't transcribe whatever I say. Instead, it shows "WARNING: reverting to cpu as cuda is not available"
@Dataquestio Před rokem
Hey there - it's hard to diagnose the issue remotely, but I wouldn't worry about the warning. CPU inference can work with vosk. The most likely issues are that there's no function connected to the on_click event for the record button, or the thread hasn't started for recording. Adding print statements in the code can help you find which pieces are working/aren't.
@SIR_Studios786 Před 2 měsíci
where is the model downloaded ?
@user-sg9kf8tu3d Před 3 měsíci
sir, i am getting error that subprocess returned non-zero exit status 1. please help me solve the problem. I need to show this project in 2 days in my college
@REALVIBESTV Před rokem
I need something like this that can work in Unreal Engine 5.1
@franekpodlach7217 Před rokem
Nice
@moses5407 Před rokem
ON device translation, even with a delay, would be great,too!
@Dataquestio Před rokem
You can actually do this - github.com/mozilla/translate .
@quinman16 Před rokem
🙏does this work with CircuitPython?🙏
@pfuhad3760 Před rokem
Is Vosk the best speech to text opensource library . If there are others with better accuracy without using GPU , can you please tell me .
@Dataquestio Před rokem
As of now, I would use whisper instead of vosk. There is a version of whisper that runs on CPU.
@pylou7064 Před rokem
Hey nice video ^^, So, it's not real time right? Is 1 second delay work ? And i have an other question is this possible to know other information, like the time the word is pronounce and when ? Thanks .
@Dataquestio Před rokem ⁺¹
Yes, there is a short delay to process. I'll probably make another video at some point showing how to do this without vosk (so you can get more info, and do true real time).
@narayanasaicharan2217 Před rokem
@@Dataquestio Eagerly waiting for the video😋
@sertacince6571 Před rokem
@@Dataquestio As far as I can see, the video has not been published and I need it urgently. Could you explain roughly how to do it from here?
@rohitdarshan_dtu3520 Před rokem ⁺¹
sir, I am getting an issue with the last lines of code if you could clear my issue will be very happy possibly please see my concern as soon as possible
model = Model(model_name="vosk-model-en-us-0.22")
@nishaldevadiga6766 Před 9 měsíci ⁺²
same
Edit: I found the solution
Your model is getting stored in the .cache folder which you can find in the user folder of C:\ Users\..\.cache\vosk\
Delete all the vosk model (litrally delete all folders/files inside that) folders u have
Then run that segment of program
Hope it helped you!
@ohassairi Před 5 měsíci
can you add translation offline ?
@alexalbert3026 Před 5 dny
Following from the first line... I got...Error displaying widget
@ohassairi Před 5 měsíci
i tried it. it works but i got some words lost !! can't find why
@hssp1534 Před rokem
I ran your code provided in the link but it gives "OSError: [Errno -9998] Invalid number of channels"..how to resolve it. Please advise for solution
@dredmaster9343 Před 8 měsíci
i am getting the same error, have you resolved your error? if yes then do tell me
@hssp1534 Před 8 měsíci
@@dredmaster9343 Nope not yet. I left trying. I havent revisited the code since a long time
@sharliduravlog2 Před 2 měsíci
is it offline or online?
@stephenyipck Před rokem
This is great but what if I want to use this code in a .py file?
@Dataquestio Před rokem
Hi Stephen - you can write the same code in a .py file. JupyterLab also allows you to export notebooks as .py files that you can run.
@kilovolt2494 Před rokem
@@Dataquestio Actually, that was part of my question. I got rid of widgets (for now) and made a .py file. It works perfectly fine, except for one little detail: when the main function stops, it doesn't terminate the script, so it hangs forever after printing "Stopped." I already tried joining threads and even explicitly saying 'quit' at the end, it still hangs. What can be the reason of that? Is that caused by vosk?
@NickolayShmyrev Před rokem
This tutorial is wrong, Vosk do not recommend using pyaudio due to latency issues. Our demos use sounddevice.
@Dataquestio Před rokem ⁺²
I wouldn't call the tutorial wrong. It works fine, and latency was not an issue from what I could tell. Both sounddevice and pyaudio are wrappers over portaudio, so they shouldn't function extremely differently (aside from the Python API being different).
I couldn't find any references to sounddevice on the vosk documentation site - if that is indeed the recommended way to use vosk, I would advertise that fact somewhere.
@SuperShank76 Před rokem ⁺¹
I took the trouble to watch the entire video but when I hit "Start Recording", there is no transcribing happening. There is no error either. Thumbs down.
@ramwarner5541 Před 5 měsíci
Sameee
@SARMADALHAFIDH Před rokem
Is it free or must pay, please?
@nishaldevadiga6766 Před 9 měsíci
free
@ivorpratap1479 Před rokem
Is anybody Help, Says attribute error FYI
p = pyaudio.Pyaudio() ,
----> 3 p = pyaudio.Pyaudio()
4 for i in range(p.get_device_count()):
5 print(p.get_device_info_index(i))
AttributeError: module 'pyaudio' has no attribute 'Pyaudio'
@JonzieBoy Před 4 měsíci
You have to be very careful with capitalization, Pyaudio is not the same as PyAudio
@doritos7372 Před rokem
OSError: [Errno -9998] Invalid number of channels .
i was getting this
@hssp1534 Před rokem
I rectified it after i used the right sounds device. I was using my cell phone mic earlier but then switched to a headset and the error never appeared again

Další v pořadí

Automatické přehrávání

Speech Recognition And Summarization System In Python [Project Tutorial]