Multimodal AI from First Principles - Neural Nets that can see, hear, AND write.

Sdílet
Vložit
  • čas přidán 29. 08. 2024

Komentáře • 22

  • @xxlvulkann6743
    @xxlvulkann6743 Před 23 dny +1

    This was a useful summary for finding papers to research developments in multimodal machine learning models!

    • @avb_fj
      @avb_fj  Před 22 dny

      Thanks! Super glad you found the video resourceful!

  • @joshuatettey7771
    @joshuatettey7771 Před 22 dny +1

    Awesome video. Thanks mate🤩

  • @boogati9221
    @boogati9221 Před 3 měsíci +3

    Dude this video was so fucking good. Keep it up.

  • @madsfrederiksen6213
    @madsfrederiksen6213 Před rokem +4

    Great and clear video! Heard about multimodal models for the first time today, and i already feel like i have a better grasp of it, thanks to you :)

  • @meet_minimalist
    @meet_minimalist Před 7 měsíci +2

    Excellent video with all the paper references. Lot to read and learn from papers. Thanks. :)

    • @avb_fj
      @avb_fj  Před 7 měsíci

      Thanks!🙏🏽

  • @tomm9716
    @tomm9716 Před měsícem

    Really good stuff mate, subbed

  • @AI_ML_DL_LLM
    @AI_ML_DL_LLM Před rokem +1

    Wow, there is lots of works behind it, thank you

    • @avb_fj
      @avb_fj  Před rokem

      Haha thanks for the comment! It’s an emerging area, and a lot of groundbreaking research really has happened in the past few years.

  • @syoyazhou8657
    @syoyazhou8657 Před rokem +1

    Like your videos. Explain things in a very clear way. Thx for sharing.

    • @avb_fj
      @avb_fj  Před rokem

      Thank you!

    • @xspydazx
      @xspydazx Před 4 měsíci

      CODE IS BETTER ??
      rom transformers import VisionEncoderDecoderModel, VisionTextDualEncoderProcessor, AutoImageProcessor, AutoTokenizer
      print('Add Vision...')
      # ADD HEAD
      # Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model
      Vmodel = VisionEncoderDecoderModel.from_encoder_decoder_pretrained(
      "google/vit-base-patch16-224-in21k", "LeroyDyer/Mixtral_AI_Tiny"
      )
      _Encoder_ImageProcessor = Vmodel.encoder
      _Decoder_ImageTokenizer = Vmodel.decoder
      _VisionEncoderDecoderModel = Vmodel
      # Add Pad tokems
      LM_MODEL.VisionEncoderDecoder = _VisionEncoderDecoderModel
      # Add Sub Components
      LM_MODEL.Encoder_ImageProcessor = _Encoder_ImageProcessor
      LM_MODEL.Decoder_ImageTokenizer = _Decoder_ImageTokenizer
      LM_MODEL
      This is how you add vision to llm (you can embed the head inside )
      print('Add Audio...')
      #Add Head
      # Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model
      _AudioFeatureExtractor = AutoFeatureExtractor.from_pretrained("openai/whisper-small")
      _AudioTokenizer = AutoTokenizer.from_pretrained("openai/whisper-small")
      _SpeechEncoderDecoder = SpeechEncoderDecoderModel.from_encoder_decoder_pretrained("openai/whisper-small","openai/whisper-small")
      # Add Pad tokems
      _SpeechEncoderDecoder.config.decoder_start_token_id = _AudioTokenizer.cls_token_id
      _SpeechEncoderDecoder.config.pad_token_id = _AudioTokenizer.pad_token_id
      LM_MODEL.SpeechEncoderDecoder = _SpeechEncoderDecoder
      # Add Sub Components
      LM_MODEL.Decoder_AudioTokenizer = _AudioTokenizer
      LM_MODEL.Encoder_AudioFeatureExtractor = _AudioFeatureExtractor
      LM_MODEL
      This is how you can add vision :

  • @ahmed_hefnawy1811
    @ahmed_hefnawy1811 Před 5 měsíci +1

    Excellent

  • @vobbilisettyveera2973
    @vobbilisettyveera2973 Před 11 měsíci +1

    awesome!!!!!!!!!!

  • @420_gunna
    @420_gunna Před 6 měsíci

    7:55 lol

    • @avb_fj
      @avb_fj  Před 6 měsíci

      Honest reactions lol😅

  • @deliciouspops
    @deliciouspops Před rokem

    do you think you should tune your audio levels or what? according to youtube, i am your 666th view

    • @avb_fj
      @avb_fj  Před rokem

      Always open for feedback. What kind of tuning are we talking about?

    • @avb_fj
      @avb_fj  Před rokem

      @@LonewolfeSlayer Sounds good... something to keep in mind for my next one. :)