Real-Time Speech Recognition With Your Microphone [Beginner Tutorial With Full Code]
Vložit
- čas přidán 16. 07. 2024
- Build a real-time local speech recognition system that uses your microphone with Python and Jupyter. This will run on your own computer, without the need for a cloud service or a GPU.
By the end, you'll have a fully working Jupyter notebook that can record microphone audio, transcribe it, and display it. You'll also have ideas for how you can extend it.
The full code and a project overview are here - github.com/dataquestio/projec... .
Chapters
00:00 Project overview
02:14 Creating Jupyter widgets to start and stop recording
11:13 Recording from your microphone with pyaudio
20:08 Recognizing live speech with vosk
29:51 Project overview and use cases
------------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: bit.ly/3O8MDef
Amazing! Thank you so much for this thorough and clear tutorial!
in thread Thread probem
Thank you for this, it worked for me. The explanation was also very clear in the tutorial, keep up the good work.
sounds really interesting - is this the level of projects you will learn at the dataquest subscription or is the members area more the prequel for what you learn on the YT channel?
Hi Wachsen - on Dataquest, we have courses that teach you data concepts, as well as projects to help you apply your skills. We have both guided projects, which have more guidance than these projects, and portfolio projects, which are similar to the CZcams projects (with some added instructions, etc).
So Dataquest both helps you learn all of the data skills, and has projects to pull it all together.
Thank you!
Hi ,
As it is mentioned in the video, the output does take a long time to get to the screen. Is there a tutorial on how to use the recasepunc model directly in the same notebook? Any help would be greatly appreciated. Thanks.
Thanks for this, i never knew widgets can be created in jupyternotebook. Thanks for this
cab someone help me everything works fine but the widgets that 2 buttons are not displaying in my jupytr lab record and stop button
great tutorial
Thank u so much for this tutorial. Please create the same with vosk about speaker identification
Thanks for the idea! -Vik
@@Dataquestio thank's for your answer. I will watch this with a great pleasure)
Thank you it was a great tutorial. Can you please create the same with vosk about Speaker identification.
Hi can you tell me how to run more than two language models at the same time... A video tutorial for the same would be a great help...
cab someone help me everything works fine but the widgets that 2 buttons are not displaying in my jupytr lab record and stop button
Can you use pycharm instead of jupyter?
Really nice explanation video! I tried to use id int visual studio code in a while loop, but i don't have success. Which changes have to be applied?
cab someone help me everything works fine but the widgets that 2 buttons are not displaying in my jupytr lab record and stop button
how to load the model if I have already downloaded it since after I entered the model name and ran the bloc of code it started downloading the model separately. Please advise how to load model if it's already downloaded
How would you modify this to work as a web app? Or on a similar client side like a bot that joins a zoom calls? Or a browser plug-in that you can turn on and off and transcribe live?
Curious as I want to implement something like this.
I know you said you can’t run on the cloud. But what if you create an endpoint that receives a boolean via a button or programmatic call that then launches this code to start transcribing by accessing the local microphone from the cloud? Is this kind of thing possible?
Hi Vik, thanks for doing this. However, it doesn't work for me :(
When i hit the record button, it doesn't transcribe whatever I say. Instead, it shows "WARNING: reverting to cpu as cuda is not available"
Hey there - it's hard to diagnose the issue remotely, but I wouldn't worry about the warning. CPU inference can work with vosk. The most likely issues are that there's no function connected to the on_click event for the record button, or the thread hasn't started for recording. Adding print statements in the code can help you find which pieces are working/aren't.
where is the model downloaded ?
sir, i am getting error that subprocess returned non-zero exit status 1. please help me solve the problem. I need to show this project in 2 days in my college
I need something like this that can work in Unreal Engine 5.1
Nice
ON device translation, even with a delay, would be great,too!
You can actually do this - github.com/mozilla/translate .
🙏does this work with CircuitPython?🙏
Is Vosk the best speech to text opensource library . If there are others with better accuracy without using GPU , can you please tell me .
As of now, I would use whisper instead of vosk. There is a version of whisper that runs on CPU.
Hey nice video ^^, So, it's not real time right? Is 1 second delay work ? And i have an other question is this possible to know other information, like the time the word is pronounce and when ? Thanks .
Yes, there is a short delay to process. I'll probably make another video at some point showing how to do this without vosk (so you can get more info, and do true real time).
@@Dataquestio Eagerly waiting for the video😋
@@Dataquestio As far as I can see, the video has not been published and I need it urgently. Could you explain roughly how to do it from here?
sir, I am getting an issue with the last lines of code if you could clear my issue will be very happy possibly please see my concern as soon as possible
model = Model(model_name="vosk-model-en-us-0.22")
same
Edit: I found the solution
Your model is getting stored in the .cache folder which you can find in the user folder of C:\ Users\..\.cache\vosk\
Delete all the vosk model (litrally delete all folders/files inside that) folders u have
Then run that segment of program
Hope it helped you!
can you add translation offline ?
Following from the first line... I got...Error displaying widget
i tried it. it works but i got some words lost !! can't find why
I ran your code provided in the link but it gives "OSError: [Errno -9998] Invalid number of channels"..how to resolve it. Please advise for solution
i am getting the same error, have you resolved your error? if yes then do tell me
@@dredmaster9343 Nope not yet. I left trying. I havent revisited the code since a long time
is it offline or online?
This is great but what if I want to use this code in a .py file?
Hi Stephen - you can write the same code in a .py file. JupyterLab also allows you to export notebooks as .py files that you can run.
@@Dataquestio Actually, that was part of my question. I got rid of widgets (for now) and made a .py file. It works perfectly fine, except for one little detail: when the main function stops, it doesn't terminate the script, so it hangs forever after printing "Stopped." I already tried joining threads and even explicitly saying 'quit' at the end, it still hangs. What can be the reason of that? Is that caused by vosk?
This tutorial is wrong, Vosk do not recommend using pyaudio due to latency issues. Our demos use sounddevice.
I wouldn't call the tutorial wrong. It works fine, and latency was not an issue from what I could tell. Both sounddevice and pyaudio are wrappers over portaudio, so they shouldn't function extremely differently (aside from the Python API being different).
I couldn't find any references to sounddevice on the vosk documentation site - if that is indeed the recommended way to use vosk, I would advertise that fact somewhere.
I took the trouble to watch the entire video but when I hit "Start Recording", there is no transcribing happening. There is no error either. Thumbs down.
Sameee
Is it free or must pay, please?
free
Is anybody Help, Says attribute error FYI
p = pyaudio.Pyaudio() ,
----> 3 p = pyaudio.Pyaudio()
4 for i in range(p.get_device_count()):
5 print(p.get_device_info_index(i))
AttributeError: module 'pyaudio' has no attribute 'Pyaudio'
You have to be very careful with capitalization, Pyaudio is not the same as PyAudio
OSError: [Errno -9998] Invalid number of channels .
i was getting this
I rectified it after i used the right sounds device. I was using my cell phone mic earlier but then switched to a headset and the error never appeared again