13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow

Sdílet
Vložit
  • čas přidán 28. 08. 2024

Komentáře • 78

  • @MyerNore
    @MyerNore Před 2 lety +3

    Love the casual presentation of this material, so sophisticated and yet improvisatory…

  • @MaestroBeats
    @MaestroBeats Před 4 lety +82

    I was setting a voice recognition password for my phone and a dog nearby barked and run away. Now I'm still looking for that dog to unlock my phone....

  • @shanejohnpaul
    @shanejohnpaul Před 5 lety +26

    In the end, instead of trying the LSTM network, you ran the Dense network by mistake!
    Please check on it.

  • @bags534
    @bags534 Před 4 lety +6

    Watching a jupyter notebook being executed live evokes a different level of interest than watching someone go through the notebook

  • @mattymallz4207
    @mattymallz4207 Před 4 lety +1

    I am 20 seconds into this video, i had to pause it and write a comment. I can tell this is gonna be AMAZING.

  • @ShaunJW1
    @ShaunJW1 Před 3 lety +5

    I'm going to develop voice recognition software, thanks this is great, subscribed.

    • @shashithadithya9744
      @shashithadithya9744 Před 3 lety

      I would like to know about your voice recognition software. So how can I contact you?

    • @waterspray5743
      @waterspray5743 Před 2 lety

      Hello, how's your progress?

  • @TecGFS
    @TecGFS Před 3 lety +4

    Could you guys do a series where you guys make your own AI assistant?

  • @kevinsasso1405
    @kevinsasso1405 Před 4 lety +1

    I got excited when i clicked the video because i thought you were speaking of 1D-cnn, move to 1Dcnn on raw audio

  • @slazerlombardi
    @slazerlombardi Před 4 lety +3

    That hairstyle adds 2.5 intelligence to his avatar.

  • @ar-visions
    @ar-visions Před 3 lety +2

    Great resource. Instantly subscribed

  • @shobhitbishop
    @shobhitbishop Před 4 lety +2

    Thank you for sharing this informative video, Can you share some information related to speaker diarization in python?

  • @mrsilver8151
    @mrsilver8151 Před 4 měsíci

    nice and informative video

  • @JS19190
    @JS19190 Před 5 lety

    A great and informative video, thank you!

  • @taptaplit1081
    @taptaplit1081 Před 3 lety +1

    @
    Weights & Biases where is the link to download more files?

  • @Pnr231
    @Pnr231 Před 2 lety +1

    Hiii sir my professor gave me a mini project topic is [Improving speech recognition using bionic wavelet feature] he said to do this in python program please help me to do it.plzzz

  • @sidvlognlifestyle
    @sidvlognlifestyle Před rokem

    is this same as if we choose the topic as " Speech spoofing detection"

  • @rhinoara7119
    @rhinoara7119 Před 3 lety +1

    I want to convert speech to text offline.. atleast a limited amount of words, can anybody help?

  • @sreyamathew327
    @sreyamathew327 Před 10 měsíci

    Can you please explain SER using CNN for a beginner?

  • @aquafina3708
    @aquafina3708 Před 2 lety

    thank for video. but i have a question. i don't know what is Feature Descriptors in animal sound recognition. Can you answer my question? My english is not good. i hope you to understand me.

  • @MS-fk8ec
    @MS-fk8ec Před 4 lety

    what are the callbacks when fitting the model, you didn't scroll there

  • @aliarslan6904
    @aliarslan6904 Před 4 lety +1

    where is the dataset obtained from original link ????

  • @alikavari351
    @alikavari351 Před 4 lety

    HI
    How to use this type of network for when we are looking for a specific word in the input sound
    For example, we are looking for the word hello
    So the first label is "hello" and the second label is something other than hello

  • @shangethrajaa
    @shangethrajaa Před 5 lety +4

    How is this speech recognition? Its just Spoken word classification.

  • @hygjob
    @hygjob Před 5 lety +1

    Thank you for sharing your good work.

  • @_mehmet
    @_mehmet Před 4 lety +1

    Thank you for source code ❤️

  • @inamullahshah7074
    @inamullahshah7074 Před 4 lety +1

    Sir how can we label our audio files dataset?

  • @user-or7ji5hv8y
    @user-or7ji5hv8y Před 3 lety

    Great video

  • @zacharyblundell6994
    @zacharyblundell6994 Před 4 lety +1

    Looking to start a voice recognition company but not tech savvy. If any tech guros are interested, please let me know? Thanks Zach

  • @azrflourish9032
    @azrflourish9032 Před 2 lety +1

    where we can download the data which's used in here?

    • @WeightsBiases
      @WeightsBiases  Před 2 lety

      You can follow along the code and get the data here!
      github.com/lukas/ml-class/tree/master/videos/cnn-audio

  • @phamthanhnhan9409
    @phamthanhnhan9409 Před 3 lety

    Is it QCNN??

  • @pricesmith1793
    @pricesmith1793 Před 2 lety

    New to ML here, very very much not new to audio. - I have a specific use case with lots of data that I want to experiment with involving six channels of low sample rate data, rather than the one. How would I go about separating each channel in the area where you opted to keep it at one?

  • @zaphbeeblebrox5333
    @zaphbeeblebrox5333 Před 3 lety

    Great video! Thank you!!

  • @user-or7ji5hv8y
    @user-or7ji5hv8y Před 3 lety

    Why not Pytorch?

  • @luisfernandoriveroslozano2859

    Hi, I was trying probe the project but i have a mistake when i run the audio.ipynb, please, i would like that somebody could help me with this mistake. Thank you
    Using TensorFlow backend.
    ---------------------------------------------------------------------------
    ModuleNotFoundError Traceback (most recent call last)
    ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'
    ---------------------------------------------------------------------------
    ImportError Traceback (most recent call last)
    ImportError: numpy.core.multiarray failed to import
    The above exception was the direct cause of the following exception:
    SystemError Traceback (most recent call last)
    ~\Anaconda3\lib\importlib\_bootstrap.py in _find_and_load(name, import_)
    SystemError: returned a result with an error set
    ---------------------------------------------------------------------------
    ImportError Traceback (most recent call last)
    ImportError: numpy.core._multiarray_umath failed to import
    ---------------------------------------------------------------------------
    ImportError Traceback (most recent call last)
    ImportError: numpy.core.umath failed to import

    • @cabbagenguyen801
      @cabbagenguyen801 Před 4 lety

      That's errors about importing library. So I think you need to check your app about numpy. Or you can try that project in Google Colab first.

    • @chrisvanpelt1677
      @chrisvanpelt1677 Před 4 lety

      Hey luis, this is fixed now if you pull the changes from git.

  • @rudrakshshukla765
    @rudrakshshukla765 Před 4 lety

    Hello, i have issue while predict can you please guide me how to predict this

  • @kopalsoni4780
    @kopalsoni4780 Před 4 lety

    Why do we have to use and specify buckets?

    • @ayushthakur3880
      @ayushthakur3880 Před 4 lety +1

      For MFCC transformation the signal is first converted to frequency domain using FFT. This need to be applied to small windows of the whole signal. The bucket specifies the length of those windows.

  • @mysteriousartiest542
    @mysteriousartiest542 Před 3 lety

    Can we use the same code to make a model to identify if an audio is fake or real?

  • @yasminebelhadj9359
    @yasminebelhadj9359 Před 5 lety +2

    Hi, can you please explain how did you convert the audio files into a useful data ?

    • @cabbagenguyen801
      @cabbagenguyen801 Před 5 lety +1

      yasmine belhadj you can use some technique like mfcc, ..... I’m using it for my project.

    • @yasminebelhadj9359
      @yasminebelhadj9359 Před 4 lety +1

      @@cabbagenguyen801 Thank you , i got it :D

    • @cabbagenguyen801
      @cabbagenguyen801 Před 4 lety

      @@yasminebelhadj9359 You're welcome ^^

    • @zohaibramzan6381
      @zohaibramzan6381 Před 4 lety

      @@cabbagenguyen801 mfcc does what? explain briefly. Also explain how he covert audio into useful data?

    • @cabbagenguyen801
      @cabbagenguyen801 Před 4 lety

      @@zohaibramzan6381 you can Google it with keyword "speech feature extraction with mfcc"

  • @souha5188
    @souha5188 Před 3 lety

    how to create confusion matrix for this tutorial ?

    • @WeightsBiases
      @WeightsBiases  Před 3 lety

      Hey Souha!
      We can make and log a confusion matrix for you, given the ground truth and the model predictions, with wandb.sklearn.plot_confusion_matrix. As the name implies, we use sklearn to generate the matrix, so head there if you want to calculate and plot the CM without logging it.
      See some examples of confusion matrix calculation, and our other scikit integrations, here: docs.wandb.com/library/integrations/scikit

    • @souha5188
      @souha5188 Před 3 lety

      ​@@WeightsBiases thank you

  • @karenhdez7735
    @karenhdez7735 Před 3 lety

    The video is amazing and it has helped me to solve one of my projects, however, when I'm running the last part validating the model, I've got this error
    AttributeError: 'NoneType' object has no attribute 'item'
    could you help me, please?

  • @science.20246
    @science.20246 Před 4 lety

    is there an example with reccurent technics like lstm

    • @kishpawar
      @kishpawar Před 4 lety

      czcams.com/video/u9FPqkuoEJ8/video.html hope this helps

  • @pablinsky2006
    @pablinsky2006 Před 3 lety

    Do you know where to find WAV files like the ones that you used?

    • @Dr.Funknstein
      @Dr.Funknstein Před rokem

      Idk if you're still looking but Google's Speech Command Dataset

  • @michaelfekadu6116
    @michaelfekadu6116 Před 5 lety

    Where is the data?

    • @WeightsBiases
      @WeightsBiases  Před 5 lety +2

      +Michael Fekadu can you elaborate?

    • @michaelfekadu6116
      @michaelfekadu6116 Před 5 lety +3

      ​@@WeightsBiases Sorry, I was not following along with the linked GitHub repository because I wanted to apply the knowledge from this video onto a different dataset. So, I did not realize that the save_data_to_array() and get_data_train_test() functions are inside of the preprocess.py file. Furthermore, the data is loaded from librosa via the librosa.load() call. In other words, I was watching the video out of context of the first video that suggests following along after setting up a local copy of the provided Git repository, which I had done previously and should have checked there before commenting.
      Thank you for checking in!
      Love the videos!

    • @WeightsBiases
      @WeightsBiases  Před 5 lety

      @@michaelfekadu6116 No problem, what are you applying this to?

    • @michaelfekadu6116
      @michaelfekadu6116 Před 5 lety +1

      Weights & Biases I plan to apply it to the DARPA TIMIT dataset that I found here:
      www.kaggle.com/mfekadu/darpa-timit-acousticphonetic-continuous-speech
      First I’ll need to write some python code that splits the data into just the words from the sentences using the time-aligned orthographic annotation files.