Feature Pyramid Network for object detection
Vložit
- čas přidán 20. 01. 2021
- Explained what is Feature Pyramid Network (FPN) and How to use it with Region Proposal Network.
Feature Pyramid Network
A Feature Pyramid Network, or FPN, is a feature extractor that takes a single-scale image of an arbitrary size as input, and outputs proportionally sized feature maps at multiple levels, in a fully convolutional fashion. This process is independent of the backbone convolutional architectures.
Feature Pyramid Networks (FPN) is used for multi-scale features
FPN is used to improve the quality of features. We do that by merging the high resolution features with the low resolution features.
The high-resolution maps have low-level features and The low-resolution maps have high-level features.
The features at early layers are of high resolution but as the network goes deeper, The last layers have more informational features but they are very low resolution features.
Some of the detectors that use FPNs are RetinaNet, PANet, NAS-FPN, etc.
You can find that small objects are detected at different scale and the larger ones at different scale. This is exactly what feature pyramid network aims for. This is a multi-scale network.
Wow, this was a PERFECT explanation, so lucid. Thank you so much for making this!
Glad to know that my video is helpful.
@@CodeWithAarohi Please turn on auto subtitle
A really clear explanation. Thank you a lot !
Glad it helped you!
Thank you so much! One shot explanation!
Glad my video is helpful
thank u so much. i have a present and your videos really helped me. i didnt study on nerual network at all but u explain it so simple and good that i understand it completly
Glad I could help!
Great explanation, thanks for your efforts. Keeps posting and wish you all the best
Glad my explanation is helpful
Thanks for sharing this. A question. Around 11 min, to convert 56*56*64 to 28*28*128, don't you need to use a filter of (2*2)*128 instead of (1*1)*128?
Thank you so much ❤.Well explained
You're welcome 😊
Thank you. May I know what are the numbers over the paths please between convolution layers? it looks something like 0.5x.? Second, the output 56x56x128 does not equal to 28x28x128 even though we have the same number of channels, so how we can add them please? Because M5 layer from top-down path is having 16x16x256 size while the feature map from bottom-up path we want to merge is of size 32x32x256, so I am not usre how this merging is possible?
Aarohi you have explained perfectly. I was exactly looking for this content and luckily landed here. Thanks much
Can you also tell little bit more about merging of two feature maps? For example 32*32*1024 should become 32*32*256 after conv 1*1, right? Now 16*16*256 (coming from top) would become 32*32*256 due to 2x, right? Then merging will take place between these two feature maps. Is this correct? And merging means adding up of two corresponding pixel values. Am I right? Your response would be extremely welcome. Thanks again
was looking for this thanks
Welcome
Perfect explanation. WOW
Glad it was helpful!
Nice visual explanation!
Thank you 😊
Thanks for the great explanation, This video helped me to get a job.
Glad to hear that! Good Luck with your Job
Hello Again,
In FPN, when we merbe up-down path layers all the way down, you said we need to multiply each layer with 1x1x256 to produce a layer that is of the same dimension as the bottom layer in the bottom-up path for merging at 8:50. In your example at 9:42, how we can add 56x56x256 with 28x28x256 please?
Another example, at 18:41, how we add 16x16x256 with 32x32x256 given both are of different dimensions?
Thank you Aarohi, this was just what I was looking for!
Welcome :)
Amazing explanation
Glad it was helpful!
Excellent thanks!
Welcome
Excellent explanation I have ever seen. Thank you so much. If you don't mind can you explain the efficientdet architecture also in your upcoming videos?
Sure
Really worthy Explanation! Could you please prepare a video for small object detection?
Sure will do soon
good explanation.
Glad it was helpful!
Thank you very much. Do you have a didicated video for regressr and classifer as I see them in most object detection videos on your channel?
No, sorry
why feature maps channel size is made same for addition ?
Thank you very much. You mentioned at 1:44 that the last layer will have the most useful feature map (though blurred as it looks), may I know why the last layer should have the most useful features please? Is it because it will get the first gradient update that vanishes as the gradients flows backward?
In Feature Pyramid Networks (FPN), the last layer of the pyramid (i.e. the highest-resolution layer) is typically considered to have the most useful features for object detection tasks. This is because the highest-resolution layer provides fine-grained details about the objects in the image, which are essential for accurate object detection and semantic segmentation tasks.
The reason the highest-resolution layer may be considered "blurred" is that, as you move up the pyramid from lower-resolution to higher-resolution layers, the feature maps become coarser and contain less detailed information about the objects in the image. However, the highest-resolution layer still contains important information about the objects' shapes and sizes, which is crucial for accurate object detection.
well explained object detection algorithm
Thank you!
amazing . God bless
Thank you!
Nice work mam...waiting for yolo's video's
Videos on yolo v3- czcams.com/video/k7B2ZqffDRE/video.html
czcams.com/video/xtn5D7yXF-4/video.html
And will make video on yolo v4 soon
Excellent explanation mam.Please upload videos on Big Data
Will surely Do but first of all I want to finish my Pipelined Videos.
demo is very good and the explanation is super and understandable, Mam you have not discussed about the padding and slide size at 11:45
Glad my video is helpful and padding , size not discussed here because that part I have discussed when I made video on resnet
how to train model FPN
Please turn on auto subtitle
Ma'am can you provide us with the slides
Mam I have 3 doubts
1.for addition of feature maps only depth of feature maps should be same or length and breadth also?
2.why 2x is applied in top down
3.Why 3*3 is applied to feature maps in top down?
Answers- 1- only depth of feature maps (channels) should be of same length.
2- 2x means we are upsampling the image by 2x ( means improving the resoultion of image by 2x).
3- Applying 3 × 3 convolution to reduces the aliasing effect when merged with the upsampled layer.
its very interesting ......................where we can get PPTs??
can i get this ppt and reference notes for this topic ma'am ?
Thx for clear explanation, I'm trying to understand Mask_RCNN, by now i know how to implement Resnet for the backbone, it would be really great if you explain how to implement FPN by code examples.
Sure will do soon
Could you pls show the step by step implementation of FPN (incorporated with Faster RCNN)?
Will try to do it
It would be really great
it's a really excellent explanation and nice content .. Can you please provide the code? thank you so much
Thankyou for liking my content ! Code is not uploaded yet .
@@CodeWithAarohi it's an interesting subject, I'm waiting for the code
hii , please make a vedio on siamese network
Will try!
Could you please provide the code?
Thank you
You're welcome
Please turn on auto subtitle