Masked Autoencoders (MAE) Paper Explained
Vložit
- čas přidán 17. 07. 2024
- Paper link: arxiv.org/abs/2111.06377
In this video, I explain how masked autoencoders work by inspiring ideas from BERT paper and pertaining a vision transformer without requiring any additional label.
Table of Content:
00:00 Intro
00:19 BERT idea
02:09 Language and vision difference
05:29 Proposed Architecture
11:30 After pertaining
14:03 Masking ratio
Hello man....my thanks are for you. Your realization with MAE is clean, which is absent in almost all other explanations on CZcams.
Glad you enjoyed it!
presentation skills = lit !
Thanks 😃
awesome
explained in detail
Glad you liked it😃
Thanks
Great
Kaiming He is just the best
Indeed
🫡 well done
does it work on small dataset ? let say 1000 images?
I don't think so. Transformers are data hungry and need a lot of data to generalize. the smallest dataset for pretraining that I saw was on ViTPose that they pretrained using this technique on 150k images and when they doubled the data it got only 1.3% better.