OpenAI CLIP Explained | Multi-modal ML

Vision Transformer Basics

Semantic Chunking for RAG

KOCOVINA VE 20 vs VE 30 LETECH 😅😂

这三姐弟太会藏了！#小丑#天使#路飞#家庭#搞笑

Only I get to bully my sister 😤

Vision Transformers (ViT) Explained + Fine-tuning in Python

James Briggs

zhlédnutí 55 105

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 6. 09. 2024

Komentáře • 48

@zoelav1398 Před 10 měsíci ⁺⁶
This was extremely clear and I was able to understand ViTs better. Thank you so much!
@fidelodok4393 Před rokem ⁺⁵
Really enjoyed every bit. Trying to setup the transformer for an Audio Regression task, the ViT has shown amazing performance in classification
@msg2clash Před rokem
Hi, do you suggest any articles that allow me to use transformers with audio classification? I would appreciate any help.
@philipplagrange314 Před rokem ⁺⁴
Great video! I've watched quite a few videos and read papers about Transformers, but your video really made me understand the concept
@jamesbriggs Před rokem
glad it was helpful!
@blueaquilae Před rokem ⁺²
The clarity of your discourse is unmatched and it's always a pleasure to follow your videos. Kinda a side effect of your passion for the domain!?
@jamesbriggs Před rokem ⁺¹
thanks a ton, I'm glad it helps - and yep it's definitely a bonus doing the videos in such a cool domain
@blueaquilae Před rokem
@@jamesbriggs I think the content is so clear that you can imagine a full courses bundle ^^
@jamesbriggs Před rokem ⁺¹
It is part of a (free) course/ebook :) www.pinecone.io/learn/image-search/
@zappist751 Před rokem ⁺²
James is the top G in deep learning
@aradhyadhruv9084 Před 11 měsíci ⁺¹
The is by far the best explanation of the paper that I could find. Thanks a lot!
@antient_atlas Před rokem ⁺¹
Great explanation, unique on YT. Thanks!
@jamesbriggs Před rokem
you're welcome!
@NikolaosTsarmpopoulos Před rokem
Very good introductory video. Thanks for sharing.
@lechavs Před 10 měsíci
Oh man, really great explanation, easy to digest. Keep it up!
@matheusrdgsf Před rokem ⁺²
Incredible content! Thx James!
@jamesbriggs Před rokem ⁺¹
Thanks!
@salehahmad5625 Před 10 měsíci
Great explanation. Very fine details. Great work
@leonardvanduuren8708 Před rokem
Another great video of yours. So clear and clarifying. Thx !
@jamesbriggs Před rokem
Glad it helped!
@fabianaltendorfer11 Před 11 měsíci
You are an inspiration james
@knorkeize Před 6 měsíci
at 5:10 it seems that max pooling and conv layers are accidentally swapped. The max pooling layer has a smaller dimension than the leading layer and usually comes after a convolution.
@pranaymathur997 Před rokem ⁺¹
Thank you so much for this video :)
@jamesbriggs Před rokem
You're welcome :)
@EkShunya Před rokem
Thank you for the effort ur putting here in your explanations. :)
@user-hx3hn1ni1o Před 10 měsíci
Great video man. Keep it up👍👍
@PauClimentPerez Před rokem
Well, Bag of Words and Bag of Visual Words WAS a merger of NLP and Computer Vision, back in the day (2010s)
@amanalok4647 Před rokem
Thanks a lot for this ! Amazing amazing explanation!
@conairebyrne7298 Před rokem ⁺²
Great video man cheers! Do you have a video about using a dataset made up of your own images on the vision transformer?
@user-wr4yl7tx3w Před rokem ⁺¹
Are there such thing that is similar to word embeddings? Or you simply take your pixel data as patches and run it through the dense layer to get projections?
@Sara-he1fz Před rokem ⁺¹
In this video, there is no explanation for the output of a vision transformer. In NLP transformers, the output is a probability distribution over the vocab but in vision transformers, I guess it is over a code book. But what this code book is and how it is aligned to the input image is not clear. Thanks a lot for this video but it is incomplete
@jamesbriggs Před rokem ⁺¹
The output from an NLP transformer is a set of token-level embeddings not a probability distribution over the vocab.
The probability distribution over the vocab that you’re referring to is actually an extra component (a head) that is used for Masked Language Modeling (MLM). ViT doesn’t use MLM for pretraining (unlike NLP transformers) so a equivalent head isn’t used.
So the output for the ViT is actually the same as the NLP transformer, it is a set of token (for ViT this is patch)-level embeddings.
I hope that makes sense? Thanks!
@Sara-he1fz Před rokem ⁺¹
@@jamesbriggs Yes it is very useful. You are right the output is a token embedding but the output for MLM is a probability distribution over the vocab in NLP. I guess MIM is used as a pretraining in vision transformer. In this case, there should be a code book if I am not mistaken
Před rokem ⁺¹
Excellent video, James. Thank you!
I have a question, how do you compute the 9.8 MM comparisons at 10:09 ?
@Ltsoftware3139 Před rokem ⁺¹
I think it should be 65.5k comparisons
@achukstok Před rokem
hey, thanks a lot. i have come from TensorFlow. so can u please answer, is it training the whole vit model for our dataset or freezing the vit pre trained part and training classification head only (like trainable=false in tf)?
@scottkorman4953 Před rokem
Thanks a lot for the video. I cant find any precise explanation about the function of self-attention layer and MLP layer in the encoder modules. Could you maybe add some information about that?
@dhaneshr Před rokem
go watch nanoGPT video by andre karpathy
@diasposangare1154 Před 2 měsíci
please can i have access to your powerpoint
@Diego0wnz Před rokem
it currently gives the error 'no module named 'datasets', anybody has a fix?
@jamesbriggs Před rokem
`pip install datasets` should work
@suchiralaknath7576 Před rokem
This video is really helful. Thank you!
@shaheerzaman620 Před rokem
great stuff!
@jamesbriggs Před rokem
Glad to hear!
@RAZZKIRAN Před rokem
thank you sir
@jamesbriggs Před rokem
Most welcome
@dhaneshr Před rokem ⁺²
no fun using huggingface transformers library. you should have explained vision transformers using a more basic implementation, than a high level library
@rockwellthivierge9193 Před rokem
Nice one..! This content desperately needs "Promo SM"!

Další v pořadí

Automatické přehrávání

OpenAI CLIP Explained | Multi-modal ML

OpenAI CLIP Explained | Multi-modal ML

Vision Transformer Basics

Vision Transformer Basics

Semantic Chunking for RAG

Semantic Chunking for RAG

KOCOVINA VE 20 vs VE 30 LETECH 😅😂

KOCOVINA VE 20 vs VE 30 LETECH 😅😂

这三姐弟太会藏了！#小丑#天使#路飞#家庭#搞笑

这三姐弟太会藏了！#小丑#天使#路飞#家庭#搞笑

Only I get to bully my sister 😤

Only I get to bully my sister 😤

Je peux le faire

Je peux le faire

Attention in transformers, visually explained | Chapter 6, Deep Learning

Attention in transformers, visually explained | Chapter 6, Deep Learning

A Very Simple Transformer Encoder for Time Series Forecasting in PyTorch

A Very Simple Transformer Encoder for Time Series Forecasting in PyTorch

Vision Transformer for Image Classification

Vision Transformer for Image Classification

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformer and its Applications

Vision Transformer and its Applications

Vision Transformer in PyTorch

Vision Transformer in PyTorch

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

CS480/680 Lecture 19: Attention and Transformer Networks

CS480/680 Lecture 19: Attention and Transformer Networks

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

A Minecraft Movie | Teaser

A Minecraft Movie | Teaser

鱿鱼游戏谁能坚持到最后呢！#火影忍者 #佐助 #家庭

鱿鱼游戏谁能坚持到最后呢！#火影忍者 #佐助 #家庭

Секрет летающего стула! #shorts

Секрет летающего стула! #shorts

BEST AIRPODS MAGIC SECRET | @Whoispelagheya

BEST AIRPODS MAGIC SECRET | @Whoispelagheya

나랑 아빠가 아이스크림 먹을 때

나랑 아빠가 아이스크림 먹을 때

Underwater Challenge 😱

Underwater Challenge 😱

KONČÍM CESTU NA OLYMPII A ZÁVODNÍ KARIÉRU

KONČÍM CESTU NA OLYMPII A ZÁVODNÍ KARIÉRU