I like how in your videos, you not only explain the details within the paper but also the more "meta" stuffs that is harder for people to grasp without reading through a lot of papers. Reading and understanding one paper is easy. Develop an intuitive understanding of a whole research subfield and its general directions is the hard part.
Thanks! Yes this one was rich in contextual information: DanNet, diagram correction from Twitter, and Swin transformer mainly I guess? Well, it's oftentimes hard to understand a specific paper without having all the necessary context - and it takes time to accumulate it.
We need to start working on reasoning - perception is converging we're out of ideas lol Bad jokes aside - at this point, it seems that CNN priors are quite adequate (in the case of natural images) - a hybrid approach (initial stages CNN-like and later stages transformer-like) seems to be the way to go, but the game is still on.
Thank you! The video is excellent. I like that you mix code + paper in explanation and the fact that you provide a context and highlight the most essential parts.
Thank you for such an in-depth explanation. Your plan of explaining the history and convergence and then going through the paper and code is great way for learners to understand the concepts deeply. Its very important to select the important portions from the paper for further exposition and to leave-out unnecessary boilerplate stuff. I liked that you didn't say "go and read the paper yourself"!
very nice content! I even didn't notice they use the old ResNet top-1 acc instead of wightman's. And that's make this model less comparative to the SOTAs.
Hi , First of all, I would like to thank you for your excellent and wonderful videos on artificial intelligence. I am a PhD student working on fast video captioning and I hope to reach real time captioning But I am confused by too many articles and too many techniques and algorithms in this field I need your help in guiding me to choose the right path among the existing methods: (traditional CNN, Transformer, YOLO, self attention only or make combination or others ) While maintaining a trade-off between speed and accuracy
I like how in your videos, you not only explain the details within the paper but also the more "meta" stuffs that is harder for people to grasp without reading through a lot of papers. Reading and understanding one paper is easy. Develop an intuitive understanding of a whole research subfield and its general directions is the hard part.
Thanks! Yes this one was rich in contextual information: DanNet, diagram correction from Twitter, and Swin transformer mainly I guess?
Well, it's oftentimes hard to understand a specific paper without having all the necessary context - and it takes time to accumulate it.
We need to start working on reasoning - perception is converging we're out of ideas lol
Bad jokes aside - at this point, it seems that CNN priors are quite adequate (in the case of natural images) - a hybrid approach (initial stages CNN-like and later stages transformer-like) seems to be the way to go, but the game is still on.
Thanks for the amazing explanation. Yes mixing up the code and paper boosts the implementation speed many folds. I love your work, you are awesome!
Thank you man
mix of paper and code is great!
agreed
Thank you! The video is excellent. I like that you mix code + paper in explanation and the fact that you provide a context and highlight the most essential parts.
Thank you!
Thank you. Very informative
Thank you for such an in-depth explanation. Your plan of explaining the history and convergence and then going through the paper and code is great way for learners to understand the concepts deeply. Its very important to select the important portions from the paper for further exposition and to leave-out unnecessary boilerplate stuff. I liked that you didn't say "go and read the paper yourself"!
Excited for this one!
Thank you so much for the brilliant explanation
Thanks! 🚀
very nice content!
I even didn't notice they use the old ResNet top-1 acc instead of wightman's.
And that's make this model less comparative to the SOTAs.
Nice Explanation. By the way, could I know which software you are using just showing multiple things in one.
Very thanks for the awesome content!
This was a great video. The best I've seen about explaining a research paper. 👏
Hah I don't know about that but thanks! 😂
Thanks a lot for your amazing effort.
Thank you!
always semirants!
My made up word just got its 1st validation - it's an official word from now on!
ayyyyyyyyyyyyyyyyyyyy :D
Great video as always. What software are using to present and annotate the paper?
Thanks! OneNote.
this channles videos are amazing
Hi , First of all, I would like to thank you for your excellent and wonderful videos on artificial intelligence.
I am a PhD student working on fast video captioning and I hope to reach real time captioning
But I am confused by too many articles and too many techniques and algorithms in this field
I need your help in guiding me to choose the right path among the existing methods:
(traditional CNN, Transformer, YOLO, self attention only or make combination or others )
While maintaining a trade-off between speed and accuracy
the pre-training they did on Imagenet-22k was supervised or unsupervised like the way transformer papers do ?
Supervised - same as ImageNet 1k. :)
Can this be used for video classification?
what tool do you use to read research papers on Ubuntu? Thank You!
I use OneNote on Windows!
Thank You 😊
3, 3, 9, s3. What does the s3 mean?
It means the authors made a typo
Boss
Hugo