Swin Transformer - Paper Explained

Swin Transformer paper animated and explained

Why Does Diffusion Work Better than Auto-Regression?

wow so cute 🥰

We need to beat it… 😳⚽️

Gli occhiali da sole non mi hanno coperto! 😎

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (paper illustrated)

AI Bites

zhlédnutí 28 448

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 29. 08. 2024

Komentáře • 63

@phattailam9814 Před rokem ⁺¹
Thank you so much for the explanation!
@mahmoudimus Před 6 měsíci ⁺¹
Great explanation. Love the music + the voice :)
@AIBites Před 6 měsíci
Thanks. Glad you liked it!
@kalluriramakrishna5732 Před rokem ⁺¹
Thank you for your fabulous Explanation
@muhammadsalmanali1066 Před 2 lety
Thank you so much for the explanation. Please keep the videos coming.
@AIBites Před 2 lety ⁺¹
Sure will do!
@robosergTV Před 2 měsíci ⁺¹
huh? ViT was the first backbone Trasnformer arch for vision, not swin
@AIBites Před dnem
awesome spot. And thanks for this info.
@user-ev8be1lk3x Před 5 měsíci ⁺¹
This is brilliant!
@AIBites Před 5 měsíci
Thanks 👍
@suke933 Před 2 lety ⁺³
Thanks for the video dear AI Bites. I was struggling to understand the SWIN architecture. It was very easily elaborated up to the point, but I would like to ask on "the motivation for different C value selection". Why is it important? If you would convey, it would further give more meaningful understanding to me.
@JC-ru4bp Před 3 lety ⁺¹
Very clear explanation of the paper idea, thanks.
@AIBites Před 3 lety
very encouraging to keep making videos :)
@JC-ru4bp Před 3 lety
@@AIBites Keep up, man,
@tonywang7933 Před 4 měsíci ⁺¹
Thank you!! So nicely explained
@AIBites Před 4 měsíci
You're welcome. So would you like to see more of papers explained or would you like more of coding videos?
@deadbeat_genius_daydreamer Před rokem
This is seriously underrated, I enjoyed this visual approach, Thanks and regards for your efforts to make this explanation. Cheers🎊👍
@AIBites Před rokem
Thank you so much Harshad! 😊
@garyhuntress6871 Před 2 lety ⁺¹
Excellent review, thanks. I've subscribed for future papers! Do you use manim for your animations?
@AIBites Před 2 lety
Hi Gary, Thanks for your comments! In some places I use manim but not always. :)
@manub.n2451 Před 2 lety ⁺¹
Thank you so much
@tensing2009 Před 2 lety
Great Video!
Thanks for making it! :)
@arpita0608 Před 2 lety ⁺¹
Thank you for illustrating this architecture. Can you make videos more on segmentation algorithms which are being used now a days please. Thanks.
@AIBites Před 2 lety ⁺²
Sure. Will plan to make one on SegFormers.
@arpita0608 Před 2 lety
@@AIBites cool ❤️
And thanks for this presentation
@keroldjoumessi Před 2 lety ⁺¹
Thanks for the video. It was very awesome and easy to follow. Therefore even if the Windows architecture reduces the complexity to compute the self-attention, I think we still have this computational issue for the overall image and the attention becomes locally as in CNNs instead of globally like in RNN. Anyway thanks for your explaination
@readera84 Před 2 lety ⁺¹
How you are saying such complex things so easily 😫 I couldn't even understand what he said 🤕
@keroldjoumessi9597 Před 2 lety
@@readera84 what don't you understand? maybe I can give you a hand
@readera84 Před 2 lety
@@keroldjoumessi9597 Windows shifting diagonally...an you make it more clear it to me
@triminh3849 Před 2 lety
great video with excellent visualization, thanks a lot
@AIBites Před 2 lety
Glad you like it! :)
@muhammadwaseem_ Před rokem ⁺¹
Good explanation
@harutmargaryan9980 Před 2 lety
Thank you, well done!
@user-gy9ef7mr7g Před rokem ⁺¹
Great explanation
@AIBites Před rokem
Thanks!
@rybdenis Před 3 lety ⁺¹
cool, thank you
@kashishbansal2651 Před 3 lety
AMAZING EXPLANATION!
@TheMomentumhd Před 2 lety
You think these swin transformers would be usefull in real time object detection? (are they fast enough)?
@anonymous-random Před 3 lety
The video is awesome! Thanks a lot!
@AIBites Před 3 lety
Glad you liked it!
@sanjeetpatil1249 Před rokem
Can you kindly explain this line in the paper, related to the patch merging layer, "The first patch merging layer concatenates the
features of each group of 2 × 2 neighboring patches, and applies a linear layer on the 4C-dimensional concatenated
features".
Thank you for the video
@djeros666 Před 3 lety
Thank you for the great effort.
@AIBites Před 3 lety
My pleasure!
@saeedataei269 Před 2 lety ⁺¹
Thanks for the explanation. plz review more SOTA papers.
@AIBites Před 2 lety ⁺¹
Sure will do Saeed! Thx. 🙂
@jialima8298 Před 2 lety
Love the voice!
@parveenkaur2747 Před 3 lety ⁺¹
Very informative video!
@AIBites Před 3 lety
Thanks! Glad you liked it.
@taoufiqelfilali2224 Před 3 lety
great exlplanation, thank you
@AIBites Před 3 lety
Thanks for your postive comment! :)
@EngRiadAlmadani Před 2 lety ⁺²
thanks for this great video just one question why we used linear layer in patch merging while we can reshaping the input patches directly using reshape method ???
@AIBites Před 2 lety ⁺²
Great question. One thing I can think of is efficiency. I believe reshape is also challenging to propagate gradients backwards.
@Deshwal.mahesh Před 2 lety ⁺¹
Maybe thy're trying to make the model learn how to merge with knowledge? Just like solving a graphical puzzle?
@suke933 Před 2 lety
@@AIBites Can we use the convolution within this scenario?
@harshkumaragarwal8326 Před 3 lety
great work, thanks :)
@rajatayyab7737 Před 3 lety ⁺¹
next should Dynamic Head: Unifying Object Detection Heads with Attentions
@rybdenis Před 3 lety
agreed
@AIBites Před 3 lety
Thanks Raja for pointing out. We will try to prioritise the paper at some point.
@anhminhtran7609 Před 3 lety
Can you civer a bit more on the using Swin for object detection please?
@peddisaivivek6676 Před 2 lety
Great video. But can you refrain from putting the music in the background while explaining. It's a little distracting when viewing it at higher speed.
@AIBites Před 2 lety
Sure will take it on board when making the future ones 👍
@nguyenanhnguyen7658 Před 3 lety
NLP, you have 100,000 words at most to permute and train with. With images? Well. ViT with 400m images can hardly manage to match ImageNet :)

Další v pořadí

Automatické přehrávání

Swin Transformer - Paper Explained

Swin Transformer - Paper Explained

Swin Transformer paper animated and explained

Swin Transformer paper animated and explained

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

We need to beat it… 😳⚽️

We need to beat it… 😳⚽️

Gli occhiali da sole non mi hanno coperto! 😎

Gli occhiali da sole non mi hanno coperto! 😎

This Famous Athlete Shocked the Olympics 👟

This Famous Athlete Shocked the Olympics 👟

XLSTM - Extended LSTMs with sLSTM and mLSTM (paper explained)

XLSTM - Extended LSTMs with sLSTM and mLSTM (paper explained)

DeiT - Data-efficient image transformers & distillation through attention (paper illustrated)

DeiT - Data-efficient image transformers & distillation through attention (paper illustrated)

Vision Transformer Basics

Vision Transformer Basics

Swin Transformer

Swin Transformer

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

DINOv2 Explained: Visual Model Insights & Comprehensive Code Guide

DINOv2 Explained: Visual Model Insights & Comprehensive Code Guide

ConvNeXt: A ConvNet for the 2020s - Paper Explained (with animations)

ConvNeXt: A ConvNet for the 2020s – Paper Explained (with animations)

Vision Transformer (ViT) - An image is worth 16x16 words | Paper Explained

Vision Transformer (ViT) - An image is worth 16x16 words | Paper Explained

Komu Přeteče Sklenička, Dostane Šlehačku do Obličeje!

Komu Přeteče Sklenička, Dostane Šlehačku do Obličeje!

C’est qui le plus fort 😂

C’est qui le plus fort 😂

艾莎生气，王子粗暴化解尴尬#艾莎

艾莎生气，王子粗暴化解尴尬#艾莎

Only I get to bully my sister 😤

Only I get to bully my sister 😤

Truck catches on fire and biker helps put it out 🔥😱 (via themountainmiller/ig)

Truck catches on fire and biker helps put it out 🔥😱 (via themountainmiller/ig)

The First Time You Say ' Mom ' #shortsfeed #funny

The First Time You Say ' Mom ' #shortsfeed #funny

When you discover a family secret

When you discover a family secret

ŽIJU V ZÁBAVNÍM PARKU 24 HODIN... je mi špatně

ŽIJU V ZÁBAVNÍM PARKU 24 HODIN... je mi špatně