This video provides a clear step by step explanation how to get from images to input features for Transformer encoders, which has proven hard to find anywhere else. Thank you.
10:46 - this is a mistake; the convolution is not equivariant to scaling - if the bird is scaled, the output of the convolution will not be simply a scaling of the original output. That would only be true if you also rescale the filters.
10:58 Scale Equivariance: Definition: A function is scale equivariant if a scaling (resizing) of the input results in a corresponding scaling of the output. Convolution in CNNs: Standard convolutions are not scale equivariant. This means that if you resize an object in an image (e.g., making it larger or smaller), the CNN may not recognize it as the same object. Convolutional filters have fixed sizes, so they may fail to detect features that are significantly larger or smaller than the size of the filter. Example: If a CNN is trained to detect a small object using a specific filter size, it might struggle to detect the same object when it appears much larger in the image because the filter is not capable of adjusting to different scales. Why is Convolution Not Scale Equivariant? The filters in a CNN have a fixed receptive field, meaning they look for patterns of a specific size. If the size of the pattern changes (e.g., due to scaling), the fixed-size filters may no longer detect the pattern effectively.
you said 196 patches in imagenet data. No of matches will depend on the input image size and the patch size. For eg: if the input image is of 400X400 and patch size of 8X8, then no of patches will be (400X400/8X8) = 50X50 =2500.
This video provides a clear step by step explanation how to get from images to input features for Transformer encoders, which has proven hard to find anywhere else.
Thank you.
Super helpful. Was very lost on the process from image patch to embedded vector until I watched this.
10:46 - this is a mistake; the convolution is not equivariant to scaling - if the bird is scaled, the output of the convolution will not be simply a scaling of the original output. That would only be true if you also rescale the filters.
Very very very nice explanation!!! I like learning the foundation/origin of the concepts where models are derived..
10:58
Scale Equivariance:
Definition: A function is scale equivariant if a scaling (resizing) of the input results in a corresponding scaling of the output.
Convolution in CNNs: Standard convolutions are not scale equivariant. This means that if you resize an object in an image (e.g., making it larger or smaller), the CNN may not recognize it as the same object. Convolutional filters have fixed sizes, so they may fail to detect features that are significantly larger or smaller than the size of the filter.
Example: If a CNN is trained to detect a small object using a specific filter size, it might struggle to detect the same object when it appears much larger in the image because the filter is not capable of adjusting to different scales.
Why is Convolution Not Scale Equivariant?
The filters in a CNN have a fixed receptive field, meaning they look for patterns of a specific size. If the size of the pattern changes (e.g., due to scaling), the fixed-size filters may no longer detect the pattern effectively.
thanks for sharing , it was extremely helpful 💯
Thank you!
amazing lecture, thank you sir!
you said 196 patches in imagenet data. No of matches will depend on the input image size and the patch size. For eg: if the input image is of 400X400 and patch size of 8X8, then no of patches will be (400X400/8X8) = 50X50 =2500.
20:17 I think the encoder blocks are stacked in parallel fashion rather than sequential?
Fantastic Video! Really loved the detailed explanation step-by-step.
thanks for sharing
What exactly is happening in the self-attention and MLP blocks of the encoder module? Could you describe it in a simplistic way?
Do you have a video about beit or dino?
Great resource!
Thank you, sir
Thanks for sharing.
Are you the channel owner??
Thank you for this genuine knowledge.
awesome
Thank you for making such a great video
thankyou much clearer
Tax evader 🤮
🤮