13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

Weights & Biases

zhlédnutí 152 996

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 28. 08. 2024

Komentáře • 78

@MyerNore Před 2 lety ⁺³
Love the casual presentation of this material, so sophisticated and yet improvisatory…
@MaestroBeats Před 4 lety ⁺⁸²
I was setting a voice recognition password for my phone and a dog nearby barked and run away. Now I'm still looking for that dog to unlock my phone....
@shashwatjatav9069 Před 3 lety
Crazy
@user-hn4sj7iz8d Před 10 měsíci
Awesome
@shanejohnpaul Před 5 lety ⁺²⁶
In the end, instead of trying the LSTM network, you ran the Dense network by mistake!
Please check on it.
@bags534 Před 4 lety ⁺⁶
Watching a jupyter notebook being executed live evokes a different level of interest than watching someone go through the notebook
@mattymallz4207 Před 4 lety ⁺¹
I am 20 seconds into this video, i had to pause it and write a comment. I can tell this is gonna be AMAZING.
@mattymallz4207 Před 4 lety ⁺¹
Yep, it was amazing.
@ShaunJW1 Před 3 lety ⁺⁵
I'm going to develop voice recognition software, thanks this is great, subscribed.
@shashithadithya9744 Před 3 lety
I would like to know about your voice recognition software. So how can I contact you?
@waterspray5743 Před 2 lety
Hello, how's your progress?
@TecGFS Před 3 lety ⁺⁴
Could you guys do a series where you guys make your own AI assistant?
@kevinsasso1405 Před 4 lety ⁺¹
I got excited when i clicked the video because i thought you were speaking of 1D-cnn, move to 1Dcnn on raw audio
@slazerlombardi Před 4 lety ⁺³
That hairstyle adds 2.5 intelligence to his avatar.
@ar-visions Před 3 lety ⁺²
Great resource. Instantly subscribed
@shobhitbishop Před 4 lety ⁺²
Thank you for sharing this informative video, Can you share some information related to speaker diarization in python?
@mrsilver8151 Před 4 měsíci
nice and informative video
@JS19190 Před 5 lety
A great and informative video, thank you!
@taptaplit1081 Před 3 lety ⁺¹
@
Weights & Biases where is the link to download more files?
@Pnr231 Před 2 lety ⁺¹
Hiii sir my professor gave me a mini project topic is [Improving speech recognition using bionic wavelet feature] he said to do this in python program please help me to do it.plzzz
@sidvlognlifestyle Před rokem
is this same as if we choose the topic as " Speech spoofing detection"
@rhinoara7119 Před 3 lety ⁺¹
I want to convert speech to text offline.. atleast a limited amount of words, can anybody help?
@sreyamathew327 Před 10 měsíci
Can you please explain SER using CNN for a beginner?
@aquafina3708 Před 2 lety
thank for video. but i have a question. i don't know what is Feature Descriptors in animal sound recognition. Can you answer my question? My english is not good. i hope you to understand me.
@MS-fk8ec Před 4 lety
what are the callbacks when fitting the model, you didn't scroll there
@aliarslan6904 Před 4 lety ⁺¹
where is the dataset obtained from original link ????
@alikavari351 Před 4 lety
HI
How to use this type of network for when we are looking for a specific word in the input sound
For example, we are looking for the word hello
So the first label is "hello" and the second label is something other than hello
@shangethrajaa Před 5 lety ⁺⁴
How is this speech recognition? Its just Spoken word classification.
@hygjob Před 5 lety ⁺¹
Thank you for sharing your good work.
@_mehmet Před 4 lety ⁺¹
Thank you for source code ❤️
@inamullahshah7074 Před 4 lety ⁺¹
Sir how can we label our audio files dataset?
@user-or7ji5hv8y Před 3 lety
Great video
@zacharyblundell6994 Před 4 lety ⁺¹
Looking to start a voice recognition company but not tech savvy. If any tech guros are interested, please let me know? Thanks Zach
@azrflourish9032 Před 2 lety ⁺¹
where we can download the data which's used in here?
@WeightsBiases Před 2 lety
You can follow along the code and get the data here!
github.com/lukas/ml-class/tree/master/videos/cnn-audio
@phamthanhnhan9409 Před 3 lety
Is it QCNN??
@pricesmith1793 Před 2 lety
New to ML here, very very much not new to audio. - I have a specific use case with lots of data that I want to experiment with involving six channels of low sample rate data, rather than the one. How would I go about separating each channel in the area where you opted to keep it at one?
@zaphbeeblebrox5333 Před 3 lety
Great video! Thank you!!
@user-or7ji5hv8y Před 3 lety
Why not Pytorch?
@luisfernandoriveroslozano2859 Před 5 lety ⁺¹
Hi, I was trying probe the project but i have a mistake when i run the audio.ipynb, please, i would like that somebody could help me with this mistake. Thank you
Using TensorFlow backend.
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
ImportError: numpy.core.multiarray failed to import
The above exception was the direct cause of the following exception:
SystemError Traceback (most recent call last)
~\Anaconda3\lib\importlib\_bootstrap.py in _find_and_load(name, import_)
SystemError: returned a result with an error set
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
ImportError: numpy.core._multiarray_umath failed to import
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
ImportError: numpy.core.umath failed to import
@cabbagenguyen801 Před 4 lety
That's errors about importing library. So I think you need to check your app about numpy. Or you can try that project in Google Colab first.
@chrisvanpelt1677 Před 4 lety
Hey luis, this is fixed now if you pull the changes from git.
@rudrakshshukla765 Před 4 lety
Hello, i have issue while predict can you please guide me how to predict this
@ayushthakur3880 Před 4 lety
What sort of issue did you face?
@kopalsoni4780 Před 4 lety
Why do we have to use and specify buckets?
@ayushthakur3880 Před 4 lety ⁺¹
For MFCC transformation the signal is first converted to frequency domain using FFT. This need to be applied to small windows of the whole signal. The bucket specifies the length of those windows.
@mysteriousartiest542 Před 3 lety
Can we use the same code to make a model to identify if an audio is fake or real?
@yasminebelhadj9359 Před 5 lety ⁺²
Hi, can you please explain how did you convert the audio files into a useful data ?
@cabbagenguyen801 Před 5 lety ⁺¹
yasmine belhadj you can use some technique like mfcc, ..... I’m using it for my project.
@yasminebelhadj9359 Před 4 lety ⁺¹
@@cabbagenguyen801 Thank you , i got it :D
@cabbagenguyen801 Před 4 lety
@@yasminebelhadj9359 You're welcome ^^
@zohaibramzan6381 Před 4 lety
@@cabbagenguyen801 mfcc does what? explain briefly. Also explain how he covert audio into useful data?
@cabbagenguyen801 Před 4 lety
@@zohaibramzan6381 you can Google it with keyword "speech feature extraction with mfcc"
@souha5188 Před 3 lety
how to create confusion matrix for this tutorial ?
@WeightsBiases Před 3 lety
Hey Souha!
We can make and log a confusion matrix for you, given the ground truth and the model predictions, with wandb.sklearn.plot_confusion_matrix. As the name implies, we use sklearn to generate the matrix, so head there if you want to calculate and plot the CM without logging it.
See some examples of confusion matrix calculation, and our other scikit integrations, here: docs.wandb.com/library/integrations/scikit
@souha5188 Před 3 lety
@@WeightsBiases thank you
@karenhdez7735 Před 3 lety
The video is amazing and it has helped me to solve one of my projects, however, when I'm running the last part validating the model, I've got this error
AttributeError: 'NoneType' object has no attribute 'item'
could you help me, please?
@science.20246 Před 4 lety
is there an example with reccurent technics like lstm
@kishpawar Před 4 lety
czcams.com/video/u9FPqkuoEJ8/video.html hope this helps
@pablinsky2006 Před 3 lety
Do you know where to find WAV files like the ones that you used?
@Dr.Funknstein Před rokem
Idk if you're still looking but Google's Speech Command Dataset
@michaelfekadu6116 Před 5 lety
Where is the data?
@WeightsBiases Před 5 lety ⁺²
+Michael Fekadu can you elaborate?
@michaelfekadu6116 Před 5 lety ⁺³
@@WeightsBiases Sorry, I was not following along with the linked GitHub repository because I wanted to apply the knowledge from this video onto a different dataset. So, I did not realize that the save_data_to_array() and get_data_train_test() functions are inside of the preprocess.py file. Furthermore, the data is loaded from librosa via the librosa.load() call. In other words, I was watching the video out of context of the first video that suggests following along after setting up a local copy of the provided Git repository, which I had done previously and should have checked there before commenting.
Thank you for checking in!
Love the videos!
@WeightsBiases Před 5 lety
@@michaelfekadu6116 No problem, what are you applying this to?
@michaelfekadu6116 Před 5 lety ⁺¹
Weights & Biases I plan to apply it to the DARPA TIMIT dataset that I found here:
www.kaggle.com/mfekadu/darpa-timit-acousticphonetic-continuous-speech
First I’ll need to write some python code that splits the data into just the words from the sentences using the time-aligned orthographic annotation files.

Další v pořadí

Automatické přehrávání

A Basic Introduction to Speech Recognition (Hidden Markov Model & Neural Networks)