CODE IS BETTER ?? rom transformers import VisionEncoderDecoderModel, VisionTextDualEncoderProcessor, AutoImageProcessor, AutoTokenizer print('Add Vision...') # ADD HEAD # Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model Vmodel = VisionEncoderDecoderModel.from_encoder_decoder_pretrained( "google/vit-base-patch16-224-in21k", "LeroyDyer/Mixtral_AI_Tiny" ) _Encoder_ImageProcessor = Vmodel.encoder _Decoder_ImageTokenizer = Vmodel.decoder _VisionEncoderDecoderModel = Vmodel # Add Pad tokems LM_MODEL.VisionEncoderDecoder = _VisionEncoderDecoderModel # Add Sub Components LM_MODEL.Encoder_ImageProcessor = _Encoder_ImageProcessor LM_MODEL.Decoder_ImageTokenizer = _Decoder_ImageTokenizer LM_MODEL This is how you add vision to llm (you can embed the head inside ) print('Add Audio...') #Add Head # Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model _AudioFeatureExtractor = AutoFeatureExtractor.from_pretrained("openai/whisper-small") _AudioTokenizer = AutoTokenizer.from_pretrained("openai/whisper-small") _SpeechEncoderDecoder = SpeechEncoderDecoderModel.from_encoder_decoder_pretrained("openai/whisper-small","openai/whisper-small") # Add Pad tokems _SpeechEncoderDecoder.config.decoder_start_token_id = _AudioTokenizer.cls_token_id _SpeechEncoderDecoder.config.pad_token_id = _AudioTokenizer.pad_token_id LM_MODEL.SpeechEncoderDecoder = _SpeechEncoderDecoder # Add Sub Components LM_MODEL.Decoder_AudioTokenizer = _AudioTokenizer LM_MODEL.Encoder_AudioFeatureExtractor = _AudioFeatureExtractor LM_MODEL This is how you can add vision :
This was a useful summary for finding papers to research developments in multimodal machine learning models!
Thanks! Super glad you found the video resourceful!
Awesome video. Thanks mate🤩
Dude this video was so fucking good. Keep it up.
Great and clear video! Heard about multimodal models for the first time today, and i already feel like i have a better grasp of it, thanks to you :)
Excellent video with all the paper references. Lot to read and learn from papers. Thanks. :)
Thanks!🙏🏽
Really good stuff mate, subbed
Wow, there is lots of works behind it, thank you
Haha thanks for the comment! It’s an emerging area, and a lot of groundbreaking research really has happened in the past few years.
Like your videos. Explain things in a very clear way. Thx for sharing.
Thank you!
CODE IS BETTER ??
rom transformers import VisionEncoderDecoderModel, VisionTextDualEncoderProcessor, AutoImageProcessor, AutoTokenizer
print('Add Vision...')
# ADD HEAD
# Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model
Vmodel = VisionEncoderDecoderModel.from_encoder_decoder_pretrained(
"google/vit-base-patch16-224-in21k", "LeroyDyer/Mixtral_AI_Tiny"
)
_Encoder_ImageProcessor = Vmodel.encoder
_Decoder_ImageTokenizer = Vmodel.decoder
_VisionEncoderDecoderModel = Vmodel
# Add Pad tokems
LM_MODEL.VisionEncoderDecoder = _VisionEncoderDecoderModel
# Add Sub Components
LM_MODEL.Encoder_ImageProcessor = _Encoder_ImageProcessor
LM_MODEL.Decoder_ImageTokenizer = _Decoder_ImageTokenizer
LM_MODEL
This is how you add vision to llm (you can embed the head inside )
print('Add Audio...')
#Add Head
# Combine pre-trained encoder and pre-trained decoder to form a Seq2Seq model
_AudioFeatureExtractor = AutoFeatureExtractor.from_pretrained("openai/whisper-small")
_AudioTokenizer = AutoTokenizer.from_pretrained("openai/whisper-small")
_SpeechEncoderDecoder = SpeechEncoderDecoderModel.from_encoder_decoder_pretrained("openai/whisper-small","openai/whisper-small")
# Add Pad tokems
_SpeechEncoderDecoder.config.decoder_start_token_id = _AudioTokenizer.cls_token_id
_SpeechEncoderDecoder.config.pad_token_id = _AudioTokenizer.pad_token_id
LM_MODEL.SpeechEncoderDecoder = _SpeechEncoderDecoder
# Add Sub Components
LM_MODEL.Decoder_AudioTokenizer = _AudioTokenizer
LM_MODEL.Encoder_AudioFeatureExtractor = _AudioFeatureExtractor
LM_MODEL
This is how you can add vision :
Excellent
awesome!!!!!!!!!!
7:55 lol
Honest reactions lol😅
do you think you should tune your audio levels or what? according to youtube, i am your 666th view
Always open for feedback. What kind of tuning are we talking about?
@@LonewolfeSlayer Sounds good... something to keep in mind for my next one. :)