Feature Pyramid Network for object detection

Code With Aarohi

zhlédnutí 13 115

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 20. 01. 2021
Explained what is Feature Pyramid Network (FPN) and How to use it with Region Proposal Network.
Feature Pyramid Network
A Feature Pyramid Network, or FPN, is a feature extractor that takes a single-scale image of an arbitrary size as input, and outputs proportionally sized feature maps at multiple levels, in a fully convolutional fashion. This process is independent of the backbone convolutional architectures.
Feature Pyramid Networks (FPN) is used for multi-scale features
FPN is used to improve the quality of features. We do that by merging the high resolution features with the low resolution features.
The high-resolution maps have low-level features and The low-resolution maps have high-level features.
The features at early layers are of high resolution but as the network goes deeper, The last layers have more informational features but they are very low resolution features.
Some of the detectors that use FPNs are RetinaNet, PANet, NAS-FPN, etc.
You can find that small objects are detected at different scale and the larger ones at different scale. This is exactly what feature pyramid network aims for. This is a multi-scale network.

Komentáře • 74

@adamdhalla1270 Před 3 lety ⁺³
Wow, this was a PERFECT explanation, so lucid. Thank you so much for making this!
@CodeWithAarohi Před 3 lety ⁺¹
Glad to know that my video is helpful.
@martymcfly695 Před 2 lety ⁺¹
@@CodeWithAarohi Please turn on auto subtitle
@pierreminier2695 Před rokem
A really clear explanation. Thank you a lot !
@CodeWithAarohi Před rokem
Glad it helped you!
@user-tj4ut8ox9r Před 2 lety
Thank you so much! One shot explanation!
@CodeWithAarohi Před 2 lety
Glad my video is helpful
@masi-ww9kr Před rokem
thank u so much. i have a present and your videos really helped me. i didnt study on nerual network at all but u explain it so simple and good that i understand it completly
@CodeWithAarohi Před rokem
Glad I could help!
@khaleddawoud363 Před 2 lety
Great explanation, thanks for your efforts. Keeps posting and wish you all the best
@CodeWithAarohi Před 2 lety
Glad my explanation is helpful
@echoway2002 Před 3 lety ⁺¹
Thanks for sharing this. A question. Around 11 min, to convert 56*56*64 to 28*28*128, don't you need to use a filter of (2*2)*128 instead of (1*1)*128?
@rafeeda3580 Před rokem
Thank you so much ❤.Well explained
@CodeWithAarohi Před rokem
You're welcome 😊
@mohammadyahya78 Před rokem
Thank you. May I know what are the numbers over the paths please between convolution layers? it looks something like 0.5x.? Second, the output 56x56x128 does not equal to 28x28x128 even though we have the same number of channels, so how we can add them please? Because M5 layer from top-down path is having 16x16x256 size while the feature map from bottom-up path we want to merge is of size 32x32x256, so I am not usre how this merging is possible?
@puiitianag Před 2 lety
Aarohi you have explained perfectly. I was exactly looking for this content and luckily landed here. Thanks much
@puiitianag Před 2 lety
Can you also tell little bit more about merging of two feature maps? For example 32*32*1024 should become 32*32*256 after conv 1*1, right? Now 16*16*256 (coming from top) would become 32*32*256 due to 2x, right? Then merging will take place between these two feature maps. Is this correct? And merging means adding up of two corresponding pixel values. Am I right? Your response would be extremely welcome. Thanks again
@wobblychicken7965 Před 3 lety
was looking for this thanks
@CodeWithAarohi Před 3 lety
Welcome
@davidjosh4811 Před 3 lety
Perfect explanation. WOW
@CodeWithAarohi Před 3 lety
Glad it was helpful!
@pranayreddy2190 Před 2 lety
Nice visual explanation!
@CodeWithAarohi Před 2 lety
Thank you 😊
@tensorthug6802 Před 2 lety
Thanks for the great explanation, This video helped me to get a job.
@CodeWithAarohi Před 2 lety ⁺¹
Glad to hear that! Good Luck with your Job
@mohammadyahya78 Před rokem
Hello Again,
In FPN, when we merbe up-down path layers all the way down, you said we need to multiply each layer with 1x1x256 to produce a layer that is of the same dimension as the bottom layer in the bottom-up path for merging at 8:50. In your example at 9:42, how we can add 56x56x256 with 28x28x256 please?
Another example, at 18:41, how we add 16x16x256 with 32x32x256 given both are of different dimensions?
@adityanjsg99 Před 2 lety
Thank you Aarohi, this was just what I was looking for!
@CodeWithAarohi Před 2 lety
Welcome :)
@sarvatmir5888 Před 2 lety
Amazing explanation
@CodeWithAarohi Před 2 lety ⁺¹
Glad it was helpful!
@fpgamachine Před 3 lety
Excellent thanks!
@CodeWithAarohi Před 3 lety
Welcome
@krishnamohan9040 Před 3 lety ⁺¹
Excellent explanation I have ever seen. Thank you so much. If you don't mind can you explain the efficientdet architecture also in your upcoming videos?
@CodeWithAarohi Před 3 lety
Sure
@mehnaztabassum1878 Před 3 lety
Really worthy Explanation! Could you please prepare a video for small object detection?
@CodeWithAarohi Před 3 lety
Sure will do soon
@AdnanMunirkhokhar Před rokem
good explanation.
@CodeWithAarohi Před rokem ⁺¹
Glad it was helpful!
@mohammadyahya78 Před rokem
Thank you very much. Do you have a didicated video for regressr and classifer as I see them in most object detection videos on your channel?
@CodeWithAarohi Před rokem ⁺¹
No, sorry
@shreshthasingh8918 Před rokem
why feature maps channel size is made same for addition ?
@mohammadyahya78 Před rokem
Thank you very much. You mentioned at 1:44 that the last layer will have the most useful feature map (though blurred as it looks), may I know why the last layer should have the most useful features please? Is it because it will get the first gradient update that vanishes as the gradients flows backward?
@CodeWithAarohi Před rokem
In Feature Pyramid Networks (FPN), the last layer of the pyramid (i.e. the highest-resolution layer) is typically considered to have the most useful features for object detection tasks. This is because the highest-resolution layer provides fine-grained details about the objects in the image, which are essential for accurate object detection and semantic segmentation tasks.
The reason the highest-resolution layer may be considered "blurred" is that, as you move up the pyramid from lower-resolution to higher-resolution layers, the feature maps become coarser and contain less detailed information about the objects in the image. However, the highest-resolution layer still contains important information about the objects' shapes and sizes, which is crucial for accurate object detection.
@prashanthsheri4926 Před 2 lety
well explained object detection algorithm
@CodeWithAarohi Před 2 lety
Thank you!
@heloone4453 Před 2 lety
amazing . God bless
@CodeWithAarohi Před 2 lety
Thank you!
@pankajray5939 Před 3 lety
Nice work mam...waiting for yolo's video's
@CodeWithAarohi Před 3 lety
Videos on yolo v3- czcams.com/video/k7B2ZqffDRE/video.html
czcams.com/video/xtn5D7yXF-4/video.html
And will make video on yolo v4 soon
@rupakdey6753 Před 3 lety
Excellent explanation mam.Please upload videos on Big Data
@CodeWithAarohi Před 3 lety
Will surely Do but first of all I want to finish my Pipelined Videos.
@srighakollapuajith4015 Před 3 lety
demo is very good and the explanation is super and understandable, Mam you have not discussed about the padding and slide size at 11:45
@CodeWithAarohi Před 3 lety
Glad my video is helpful and padding , size not discussed here because that part I have discussed when I made video on resnet
@hafsayousif2474 Před 3 lety
how to train model FPN
@ayarzuki Před 3 lety ⁺²
Please turn on auto subtitle
@vernobsarma7840 Před 2 lety
Ma'am can you provide us with the slides
@madhusudanverma6564 Před 3 lety
Mam I have 3 doubts
1.for addition of feature maps only depth of feature maps should be same or length and breadth also?
2.why 2x is applied in top down
3.Why 3*3 is applied to feature maps in top down?
@CodeWithAarohi Před 3 lety
Answers- 1- only depth of feature maps (channels) should be of same length.
2- 2x means we are upsampling the image by 2x ( means improving the resoultion of image by 2x).
3- Applying 3 × 3 convolution to reduces the aliasing effect when merged with the upsampled layer.
@zainhassan9508 Před rokem
its very interesting ......................where we can get PPTs??
@CrackGate2025 Před 4 měsíci
can i get this ppt and reference notes for this topic ma'am ?
@Mr.Esmaily Před 3 lety
Thx for clear explanation, I'm trying to understand Mask_RCNN, by now i know how to implement Resnet for the backbone, it would be really great if you explain how to implement FPN by code examples.
@CodeWithAarohi Před 3 lety
Sure will do soon
@mehnaztabassum1878 Před 3 lety ⁺¹
Could you pls show the step by step implementation of FPN (incorporated with Faster RCNN)?
@CodeWithAarohi Před 3 lety ⁺¹
Will try to do it
@Mr.Esmaily Před 3 lety
It would be really great
@safaalbdeary2966 Před 2 lety
it's a really excellent explanation and nice content .. Can you please provide the code? thank you so much
@CodeWithAarohi Před 2 lety
Thankyou for liking my content ! Code is not uploaded yet .
@safaalbdeary2966 Před 2 lety
@@CodeWithAarohi it's an interesting subject, I'm waiting for the code
@user-ew8dl5wr9t Před 4 měsíci
hii , please make a vedio on siamese network
@CodeWithAarohi Před 4 měsíci
Will try!
@mehnaztabassum1878 Před 3 lety ⁺¹
Could you please provide the code?
@pranavpatel6786 Před rokem
Thank you
@CodeWithAarohi Před rokem
You're welcome
@martymcfly695 Před 2 lety
Please turn on auto subtitle

Další v pořadí

Automatické přehrávání

EfficientDet: Scalable and Efficient Object Detection | Object Detection