SUPER Fast AI Real Time Speech to Text Transcribtion - Faster Whisper / Python

Sdílet
Vložit
  • čas přidán 25. 06. 2024
  • SUPER Fast AI Real Time Voice to Text Transcribtion - Faster Whisper / Python
    👊 Become a member and get access to GitHub:
    / allaboutai
    Get a FREE 45+ ChatGPT Prompts PDF here:
    📧 Join the newsletter:
    www.allabtai.com/newsletter/
    🌐 My website:
    www.allabtai.com
    Faster-Whisperer:
    github.com/SYSTRAN/faster-whi...
    I created a almost zero latency real time AI voice to text transcribtion using faster whisperer and python. We are gonna look at some use cases for the script and a preview of my upcoming video. Enjoy!
    00:00 Intro
    00:21 Real Time AI Transcribtion "Mr.Beast"
    01:25 Setup / Python Code
    03:33 Real Time AI Transcribtion "Sentiment Analysis"
    05:51 Real Time AI Transcribtion "Secret Project"
    08:14 Conclusion
  • Věda a technologie

Komentáře • 98

  • @OliNorwell
    @OliNorwell Před 4 měsíci +5

    Epic! - These videos are some of the best stuff on CZcams - love the idea with the image generation at the end

  • @theraybae
    @theraybae Před 5 měsíci +5

    This is amazing and inspiring. I love the ending of the video and can’t wait for Wednesday. As a dyslexic person I think you unlocked a new use case for learning.

  • @ryanjames3907
    @ryanjames3907 Před 5 měsíci

    wow !! great video !!! Thank you for being so generous and teaching this to us, this is epic stuff! I can already start see all kinds of use cases, I cant wait to get it running, I'm really looking forward to Wednesday's video . Thanks again from Canada

  • @radudamianov
    @radudamianov Před 5 měsíci

    Excellent! Thank you so much for sharing!

  • @unrealminigolf4015
    @unrealminigolf4015 Před 5 měsíci

    Awesome bro! ❤

  • @benscottbongiben
    @benscottbongiben Před 5 měsíci

    Good to see transcription and generate responses as audio in real-time for phone call

  • @ArmandoMenicacci
    @ArmandoMenicacci Před 5 měsíci +1

    Fantastic !!! A bit fast in explaining and showing, but I can always pause!

  • @ReadyMedia-no
    @ReadyMedia-no Před 5 měsíci +3

    There is a product for Live video Transcription there. Live text services are expensive and does not work on many current languages.. Set up a server/service that will ingest a RTMP video source, delay the video and overlay text on video in perfect sync. then offer RTMP output with burned in Live text. :) There is need for this service.

  • @enesgul2970
    @enesgul2970 Před 5 měsíci +1

    Gerçekten çok iyisiniz.

  • @HammerOnTheNet
    @HammerOnTheNet Před 5 měsíci

    Amazing and inspiring work! Kris what about something less powerful but better accessible in terms of hardware?

  • @t-dsai
    @t-dsai Před 5 měsíci

    Thanks for sharing your knowledge/experience.
    I'm bit perplexed. The description here mentions 45+ prompts in the PDF book, the newsletter website says 40+, and the PDF doc says 35+. Which number is correct?

  • @MultiBigkush
    @MultiBigkush Před měsícem +6

    Code:
    import os
    import time
    import wave
    import pyaudio
    from faster_whisper import WhisperModel
    # Определяем константы
    NEON_GREEN = '\033[32m'
    RESET_COLOR = '\033[0m'
    os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
    # Функция для записи аудио-фрагмента
    def record_chunk(p, stream, file_path, chunk_length=1):
    """
    Записывает аудиофрагмент в файл.
    Args:
    p (pyaudio.PyAudio): Объект PyAudio.
    stream (pyaudio.Stream): Поток PyAudio.
    file_path (str): Путь к файлу, куда будет записан аудиофрагмент.
    chunk_length (int): Длина аудиофрагмента в секундах.
    Returns:
    None
    """
    frames = []
    for _ in range(0, int(16000 / 1024 * chunk_length)):
    data = stream.read(1024)
    frames.append(data)
    wf = wave.open(file_path, 'wb')
    wf.setnchannels(1)
    wf.setsampwidth(p.get_sample_size(pyaudio.paInt16))
    wf.setframerate(16000)
    wf.writeframes(b''.join(frames))
    wf.close()
    def transcribe_chunk(model, file_path):
    segments, info = model.transcribe(file_path, beam_size=7)
    transcription = ''.join(segment.text for segment in segments)
    return transcription
    def main2():
    """
    Основная функция программы.
    """
    # Выбираем модель Whisper
    model = WhisperModel("medium", device="cuda", compute_type="float16")
    # Инициализируем PyAudio
    p = pyaudio.PyAudio()
    # Открываем поток записи
    stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
    # Инициализируем пустую строку для накопления транскрипций
    accumulated_transcription = ""
    try:
    while True:
    # Записываем аудиофрагмент
    chunk_file = "temp_chunk.wav"
    record_chunk(p, stream, chunk_file)
    # Транскрибируем аудиофрагмент
    transcription = transcribe_chunk(model, chunk_file)
    print(NEON_GREEN + transcription + RESET_COLOR)
    # Удаляем временный файл
    os.remove(chunk_file)
    # Добавляем новую транскрипцию к накопленной транскрипции
    accumulated_transcription += transcription + " "
    except KeyboardInterrupt:
    print("Stopping...")
    # Записываем накопленную транскрипцию в лог-файл
    with open("log.txt", "w") as log_file:
    log_file.write(accumulated_transcription)
    finally:
    print("LOG" + accumulated_transcription)
    # Закрываем поток записи
    stream.stop_stream()
    stream.close()
    # Останавливаем PyAudio
    p.terminate()
    if __name__ == "__main__":
    main2()

  • @bim-techs
    @bim-techs Před 5 měsíci +9

    Tips: You can transform your device's audio output into a "microphone" on Windows, so you don't need to place your headphones over your microphone.
    1. Press Windows key + R -> type "mmsys.cpl"
    2. In the Recording tab, enable the Stereo Mix option. Now, "Stereo Mix" is an available microphone option! You can select it as the audio input.

    • @weekendmakeit7760
      @weekendmakeit7760 Před 5 měsíci +2

      this really helped me! Thank you!

    • @aoeu256
      @aoeu256 Před 4 měsíci +1

      this a grewt idea, i was using voice meeter as a virtual audio thingy and its complicated to use

  • @calvinapollos
    @calvinapollos Před 2 měsíci

    Great video! Thanks for going through this in such an easy-to-understand way! Can you share the python scripts?

  • @maizizhamdo
    @maizizhamdo Před 2 měsíci

    i love your videos man , please video about fastwhisper on docker api please

  • @fredericpaillot2570
    @fredericpaillot2570 Před 5 měsíci

    Hi Kris! I love what you do, I would like to become a member of your channel, but I can't access the page to subscribe, do you have a direct link? the one in description doesn't work for me.. have a good day!

  • @ferluisch
    @ferluisch Před měsícem

    Hey man this is really cool! I'd like to know if you:
    1) used the whisper v3 model? or the v2?
    2) If you have seen the demos from gpt4, they also showed that gpt ASR is better than whisper v3, wonder if it will be open like whisper.

  • @martinvizar6430
    @martinvizar6430 Před 4 měsíci

    Impresario thank you

  • @maverick1901
    @maverick1901 Před 4 měsíci

    running fully local is one thing ... doing this via webaudio api towards a backend is a different topic - is there any implementation for that as well foreseen?

  • @kimsteinhaug
    @kimsteinhaug Před 5 měsíci

    Interesting stuff on the image creation at the end while talking, not sure if you are taking into consideration puctuation in you sentences? Im pretty sure this would have to do with something cool, maby keeping an overview of all the text that has been moving out of the "buffer" for style ? Looks like something I could have a lot of fun with, do not have the GPU though :/ Colab however.

  • @royzac7829
    @royzac7829 Před 4 měsíci

    How does the transcription performance compare to assemblyAI?

  • @claudiobalderrama1599
    @claudiobalderrama1599 Před 3 měsíci

    Do you think this could be used to transcribe, for example, phone calls made through the browser? I would greatly appreciate your response :)

  • @henrijohnson7779
    @henrijohnson7779 Před 4 měsíci

    @Kris : I already joined as an Adept member on Jan 18th 2024 and requested access to the Github Repo via email and also via Discord but have not had any response from you yet ?

  • @AlexPopov-hv3kp
    @AlexPopov-hv3kp Před měsícem

    what is a transcribe_chunk function in the code? Seems that it's not from faster_whisper?

  • @svenborgers6908
    @svenborgers6908 Před 3 měsíci +2

    I have tried to get this to run on M1 MacBook. No joy. The CPU maxes out even with the tiny model. But then I tried with the Whisper.cpp implementation which is compiled for apple silicon. I found a whisper-cpp-python wrapper for that library. That actually runs and is far less CPU bound. It has a bit of a stutter, it is not as clean, it misses words between the chunk processing but you can see that with just a little bit more power it could work.

    • @MrThaitrinh
      @MrThaitrinh Před 3 měsíci

      Hi Seven, could you please share your code with me? Thank you very much!

  • @George-kx8fl
    @George-kx8fl Před 4 měsíci

    Would it be possible to do speaker recognition then pipe it into translation

  • @aoeu256
    @aoeu256 Před 4 měsíci

    This will be a good tool for language immersion chinese / japanese / indonesian along with the deepl clipboard tool, edge browsers tts engine.

  • @mattaylor-qg4yw
    @mattaylor-qg4yw Před 2 měsíci

    just joined. would be good to get my grubby paws on the files for this.

  • @jotixh
    @jotixh Před 2 měsíci

    Is there a way to connect a live streaming url?

  • @TonyHoangPodcast
    @TonyHoangPodcast Před 2 měsíci

    does it support speaker diairzation?

  • @AdrianC2006Uk
    @AdrianC2006Uk Před měsícem

    That image gen project was pukka!

  • @Edward_ZS
    @Edward_ZS Před 5 měsíci

    Has anyone updated the code from the previous video to use this recording method instead?

  • @thnmanucian7993
    @thnmanucian7993 Před 2 měsíci

    Hello. I’m beginner in this major. How can I get your code to refer? Thank you

  • @haloBean
    @haloBean Před 2 měsíci

    Hi,
    Can get the github repo of the above code ?
    Thanks

  • @reddyparthu5978
    @reddyparthu5978 Před 3 měsíci +3

    how to get the code for this?

  • @Siri-tz7dz
    @Siri-tz7dz Před 2 měsíci

    where do i get the setup/python code

  • @maxstauss9579
    @maxstauss9579 Před 29 dny

    i cant find the script of the realtime translation pls help me finding it :((

  • @kate-pt2ny
    @kate-pt2ny Před 5 měsíci

    Kris, you are a genius. Real-time speech transcription can do a lot of things. The last example is great. I can’t wait to watch the video released on Wednesday. My computer is a Mac M chip computer. I found the code in your github and changed it to run on the CPU. Later, some problems occurred, such as incomplete transcribed content and OSError. Can you release a version suitable for Mac computers? grateful

  • @aseel6910
    @aseel6910 Před 2 měsíci

    If there any way to translate this text to another languages it will be awesome

  • @joaopaulonadal8484
    @joaopaulonadal8484 Před 4 měsíci

    How can i get acess to this code?

  • @MarxOrx
    @MarxOrx Před 5 měsíci

    BROOOO 🎉 FIRST

  • @avgplayer
    @avgplayer Před 5 měsíci

    Waiting for the in deep video :) Btw your discord invite link is expired.

  • @crazyforhyunwoo119
    @crazyforhyunwoo119 Před 3 měsíci

    Can I did this with javascript?

  • @himanshujaviya6021
    @himanshujaviya6021 Před měsícem

    Can we get the code used in this video that would be really helpful

  • @danielgh4814
    @danielgh4814 Před 4 měsíci +1

    Hi, I'm a subscriber but I do not have access to your github ,can you helpme please?

  • @ytemre
    @ytemre Před 3 měsíci +1

    I became a member how do I get access to the code and the github for this

    • @AllAboutAI
      @AllAboutAI  Před 3 měsíci

      hello :D send me a e-mail at kris@allabtai.com

  • @RicardoMaciasYepez6913

    Can this run on raspberry pi?

  • @vallu-Tech
    @vallu-Tech Před 3 měsíci

    Bro can you put th video about live streaming voice to text

  • @ahmedelkamash9323
    @ahmedelkamash9323 Před měsícem

    how can we download this script?

  • @saqqara6361
    @saqqara6361 Před dnem

    how to access your sourcecode as a paid channel member?

  • @user-sd3qe7qu9c
    @user-sd3qe7qu9c Před 3 měsíci

    🧡

  • @thedoctor5478
    @thedoctor5478 Před 5 měsíci

    I think there's an even faster whisper module but I forget what it's called

  • @leucome
    @leucome Před 5 měsíci

    Faster whisper and Insanely Fast Whisper don't seem to have AMD gpu support yet. So I had to go with an alternative for the 7900xt. I used wishper.cpp with cuda/HIP + distilled whisper model. Seriously this combination is kinda real-time too, even when using the distil large v2. Though there is a downside to that, the TTS and Whisper on the GPU gobble up like 8GB or vram. This put some limit to the LLM model I can use at same time.

  • @lutusp
    @lutusp Před 5 měsíci

    Hey, it's in your video description, therefore easily fixed: the word is "transcription". Why not avoid the irony of a video that extols modern AI voice to text ... transcription ... in which the AI engine will surely avoid this mistake, and at the speed of light.

  • @nusretalikok823
    @nusretalikok823 Před 5 měsíci

    where can we find the code that you used?

  • @ItsNsour
    @ItsNsour Před 2 měsíci +1

    can it translate?

  • @thebigbigdaddy
    @thebigbigdaddy Před 5 měsíci

    how can we identify different speakers?

    • @ickorling7328
      @ickorling7328 Před 3 měsíci

      Microsoft co-pilot in a teams call recording transcription. Cant simply call, needs to he a meeting call... subtle difference. Try 'meet now' in teams calender view, or make calendar event.

  • @kebman
    @kebman Před 5 měsíci

    I might be jaded but... I mean really, how about an AI that calculates the probability of drone attacks or artillery attacks? How about an AI that calculates the probability of soldiers hiding in terrain? I mean, there are already good search algorithms out there, that one may-or-may-not use to carry out artillery strikes. I'm just thinking aloud here. Probably nothing.

  • @110gotrek
    @110gotrek Před 5 měsíci +8

    Now make it translate and do phone-cals

  • @harshitsingh3061
    @harshitsingh3061 Před 5 měsíci +1

    where can we get the code

  • @user-mz5jy4nt5p
    @user-mz5jy4nt5p Před 5 měsíci +2

    could you do another demo to see how it can translate in real time?

    • @gregh7457
      @gregh7457 Před 5 měsíci

      yes! there are no really good or fast translation apps available. CZcams auto translate is horrible!

  • @kylebolt5861
    @kylebolt5861 Před 5 měsíci +1

    How do we join your community?

    • @AllAboutAI
      @AllAboutAI  Před 5 měsíci

      Link in desc :) youtube member

    • @najafzawar8168
      @najafzawar8168 Před 5 měsíci

      @@AllAboutAI just subscribed to your channel but not getting GitHub code..

  • @erenkaraboga8570
    @erenkaraboga8570 Před 3 měsíci

    Can we take source code ?

  • @kebman
    @kebman Před 5 měsíci

    The sentiment analysis really scares me. I mean, there's absolutely no chance that'll be abused by big tech in terms of political marketing. I mean, like, there's no way in hell right?

  • @maxstauss4821
    @maxstauss4821 Před 28 dny

    iam a member but i cant acces the github pls HELP

  • @filipphenderson6342
    @filipphenderson6342 Před 4 měsíci +41

    Pulling in people with a flashy thumbnail of a Python code that works and then trying to monetize your code based on a library that is already supposed to be open source is in my opinion bs. it is not fair for beginners that might not know Python or whisper very well. for that I give you a thumbs down!

  • @digitalsoultech
    @digitalsoultech Před 5 měsíci +1

    The accuracy sucks. Many words are incorrect which you can see in the image itself.
    This isn't usable in the real world.

  • @user-gx9yu2kk8z
    @user-gx9yu2kk8z Před 5 měsíci

    🎈

  • @ramadanhasan1574
    @ramadanhasan1574 Před 5 měsíci

    Where is the link to this source code ? Thanks amazing

  • @curtisnewton895
    @curtisnewton895 Před 4 měsíci

    transcriPtion

  • @fufu9352
    @fufu9352 Před 3 měsíci

    Zero latency? I have been check your video timeline. terminal output and audio is not correspond. you must be living a world 1-2 second ahead our timeline. 😅

  • @rahar6009
    @rahar6009 Před 3 měsíci +1

    It is bs to make an open source code monetized! So sorry for you and your kinds... unsubs.

  • @virkutisss3563
    @virkutisss3563 Před 3 měsíci

    Can you use different languages?

  • @vaibhavmishra1100
    @vaibhavmishra1100 Před 4 měsíci

    can you tell me the solution of this error : Could not load library cudnn_ops_infer64_8.dll. Error code 126
    Please make sure cudnn_ops_infer64_8.dll is in your library path!

  • @user-sw3im6bo2l
    @user-sw3im6bo2l Před 2 měsíci

    I have registered as a member, please check your email

  • @abdurrahmankeskin3716
    @abdurrahmankeskin3716 Před 13 dny

    how to get the code for this?