Vision Transformer Basics

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

The math behind Attention: Keys, Queries, and Values matrices

ZPOVĚĎ POLICISTY: "Mrtvoly skoro každý den, jednou s námi komunikoval muž s ustřelenou půlkou hlavy"

女孩妒忌小丑女？ #小丑#shorts

Only I get to bully my sister 😤

Vision Transformer and its Applications

Open Data Science

zhlédnutí 41 345

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 29. 08. 2024

Komentáře • 25

@jhjbm1959 Před 9 měsíci ⁺³
This video provides a clear step by step explanation how to get from images to input features for Transformer encoders, which has proven hard to find anywhere else.
Thank you.
@PrestonRahim Před rokem ⁺⁵
Super helpful. Was very lost on the process from image patch to embedded vector until I watched this.
@crapadopalese Před rokem ⁺⁸
10:46 - this is a mistake; the convolution is not equivariant to scaling - if the bird is scaled, the output of the convolution will not be simply a scaling of the original output. That would only be true if you also rescale the filters.
@DrAIScience Před 4 měsíci
Very very very nice explanation!!! I like learning the foundation/origin of the concepts where models are derived..
@SarangBanakhede Před 21 dnem
10:58
Scale Equivariance:
Definition: A function is scale equivariant if a scaling (resizing) of the input results in a corresponding scaling of the output.
Convolution in CNNs: Standard convolutions are not scale equivariant. This means that if you resize an object in an image (e.g., making it larger or smaller), the CNN may not recognize it as the same object. Convolutional filters have fixed sizes, so they may fail to detect features that are significantly larger or smaller than the size of the filter.
Example: If a CNN is trained to detect a small object using a specific filter size, it might struggle to detect the same object when it appears much larger in the image because the filter is not capable of adjusting to different scales.
Why is Convolution Not Scale Equivariant?
The filters in a CNN have a fixed receptive field, meaning they look for patterns of a specific size. If the size of the pattern changes (e.g., due to scaling), the fixed-size filters may no longer detect the pattern effectively.
@ailinhasanpour Před rokem ⁺⁴
thanks for sharing , it was extremely helpful 💯
@OpenDataScienceCon Před rokem
Thank you!
@xXMaDGaMeR Před rokem ⁺³
amazing lecture, thank you sir!
@sahil-vz8or Před rokem ⁺¹
you said 196 patches in imagenet data. No of matches will depend on the input image size and the patch size. For eg: if the input image is of 400X400 and patch size of 8X8, then no of patches will be (400X400/8X8) = 50X50 =2500.
@rikki146 Před rokem ⁺¹
20:17 I think the encoder blocks are stacked in parallel fashion rather than sequential?
@mohammedrakib3736 Před 5 měsíci
Fantastic Video! Really loved the detailed explanation step-by-step.
@PRASHANTKUMAR-ze6mj Před rokem ⁺¹
thanks for sharing
@scottkorman4953 Před rokem ⁺⁴
What exactly is happening in the self-attention and MLP blocks of the encoder module? Could you describe it in a simplistic way?
@DrAIScience Před 4 měsíci
Do you have a video about beit or dino?
@anirudhgangadhar6158 Před rokem
Great resource!
@user-co6pu8zv3v Před rokem
Thank you, sir
@muhammadshahzaibiqbal7658 Před 2 lety
Thanks for sharing.
@DrAIScience Před 4 měsíci
Are you the channel owner??
@capocianni1043 Před rokem
Thank you for this genuine knowledge.
@liangcheng9856 Před rokem
awesome
@hoangtrung.aiengineer Před rokem
Thank you for making such a great video
@saimasideeq7254 Před 9 měsíci
thankyou much clearer
@improvement_developer8995 Před rokem ⁺²
Tax evader 🤮
@improvement_developer8995 Před rokem ⁺²
🤮

Další v pořadí

Automatické přehrávání

Vision Transformer Basics

Vision Transformer Basics

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (Paper Explained)

The math behind Attention: Keys, Queries, and Values matrices

The math behind Attention: Keys, Queries, and Values matrices

ZPOVĚĎ POLICISTY: "Mrtvoly skoro každý den, jednou s námi komunikoval muž s ustřelenou půlkou hlavy"

ZPOVĚĎ POLICISTY: "Mrtvoly skoro každý den, jednou s námi komunikoval muž s ustřelenou půlkou hlavy"

女孩妒忌小丑女？ #小丑#shorts

女孩妒忌小丑女？ #小丑#shorts

Only I get to bully my sister 😤

Only I get to bully my sister 😤

Na stavbě už jsem viděl asi všechno, flastry dáváme každý den, říká autor humorných videí s dělníky

Na stavbě už jsem viděl asi všechno, flastry dáváme každý den, říká autor humorných videí s dělníky

Vision Transformers (ViT) Explained + Fine-tuning in Python

Vision Transformers (ViT) Explained + Fine-tuning in Python

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformer for Image Classification

Vision Transformer for Image Classification

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

DINO: Emerging Properties in Self-Supervised Vision Transformers (Facebook AI Research Explained)

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

How AI Will Step Off the Screen and into the Real World | Daniela Rus | TED

How AI Will Step Off the Screen and into the Real World | Daniela Rus | TED

DINO - DETR with Improved DeNoising AnchorBoxes for End-to-End Object Detection

DINO - DETR with Improved DeNoising AnchorBoxes for End-to-End Object Detection

EfficientML.ai Lecture 14 - Vision Transformer (MIT 6.5940, Fall 2023)

EfficientML.ai Lecture 14 - Vision Transformer (MIT 6.5940, Fall 2023)

Vision Transformer (ViT) - An image is worth 16x16 words | Paper Explained

Vision Transformer (ViT) - An image is worth 16x16 words | Paper Explained

I play this like Cristiano Ronaldo⚽❓

I play this like Cristiano Ronaldo⚽❓

Violet Beauregarde Doll🫐

Violet Beauregarde Doll🫐

艾莎生气，王子粗暴化解尴尬#艾莎

艾莎生气，王子粗暴化解尴尬#艾莎

WELCOME TO THE FAMILY, MOE! (Brawl Stars Animation)

WELCOME TO THE FAMILY, MOE! (Brawl Stars Animation)

KONČÍM CESTU NA OLYMPII A ZÁVODNÍ KARIÉRU

KONČÍM CESTU NA OLYMPII A ZÁVODNÍ KARIÉRU

Sad To Announce I Did Not Qualify For Mens 2024 Olympic Gymnastics Team

Sad To Announce I Did Not Qualify For Mens 2024 Olympic Gymnastics Team

7 Nejhorších Katastrof v Česku

7 Nejhorších Katastrof v Česku