Feature Pyramid Network for object detection

Sdílet
Vložit
  • čas přidán 20. 01. 2021
  • Explained what is Feature Pyramid Network (FPN) and How to use it with Region Proposal Network.
    Feature Pyramid Network
    A Feature Pyramid Network, or FPN, is a feature extractor that takes a single-scale image of an arbitrary size as input, and outputs proportionally sized feature maps at multiple levels, in a fully convolutional fashion. This process is independent of the backbone convolutional architectures.
    Feature Pyramid Networks (FPN) is used for multi-scale features
    FPN is used to improve the quality of features. We do that by merging the high resolution features with the low resolution features.
    The high-resolution maps have low-level features and The low-resolution maps have high-level features.
    The features at early layers are of high resolution but as the network goes deeper, The last layers have more informational features but they are very low resolution features.
    Some of the detectors that use FPNs are RetinaNet, PANet, NAS-FPN, etc.
    You can find that small objects are detected at different scale and the larger ones at different scale. This is exactly what feature pyramid network aims for. This is a multi-scale network.

Komentáře • 74

  • @adamdhalla1270
    @adamdhalla1270 Před 3 lety +3

    Wow, this was a PERFECT explanation, so lucid. Thank you so much for making this!

  • @pierreminier2695
    @pierreminier2695 Před rokem

    A really clear explanation. Thank you a lot !

  • @user-tj4ut8ox9r
    @user-tj4ut8ox9r Před 2 lety

    Thank you so much! One shot explanation!

  • @masi-ww9kr
    @masi-ww9kr Před rokem

    thank u so much. i have a present and your videos really helped me. i didnt study on nerual network at all but u explain it so simple and good that i understand it completly

  • @khaleddawoud363
    @khaleddawoud363 Před 2 lety

    Great explanation, thanks for your efforts. Keeps posting and wish you all the best

  • @echoway2002
    @echoway2002 Před 3 lety +1

    Thanks for sharing this. A question. Around 11 min, to convert 56*56*64 to 28*28*128, don't you need to use a filter of (2*2)*128 instead of (1*1)*128?

  • @rafeeda3580
    @rafeeda3580 Před rokem

    Thank you so much ❤.Well explained

  • @mohammadyahya78
    @mohammadyahya78 Před rokem

    Thank you. May I know what are the numbers over the paths please between convolution layers? it looks something like 0.5x.? Second, the output 56x56x128 does not equal to 28x28x128 even though we have the same number of channels, so how we can add them please? Because M5 layer from top-down path is having 16x16x256 size while the feature map from bottom-up path we want to merge is of size 32x32x256, so I am not usre how this merging is possible?

  • @puiitianag
    @puiitianag Před 2 lety

    Aarohi you have explained perfectly. I was exactly looking for this content and luckily landed here. Thanks much

    • @puiitianag
      @puiitianag Před 2 lety

      Can you also tell little bit more about merging of two feature maps? For example 32*32*1024 should become 32*32*256 after conv 1*1, right? Now 16*16*256 (coming from top) would become 32*32*256 due to 2x, right? Then merging will take place between these two feature maps. Is this correct? And merging means adding up of two corresponding pixel values. Am I right? Your response would be extremely welcome. Thanks again

  • @wobblychicken7965
    @wobblychicken7965 Před 3 lety

    was looking for this thanks

  • @davidjosh4811
    @davidjosh4811 Před 3 lety

    Perfect explanation. WOW

  • @pranayreddy2190
    @pranayreddy2190 Před 2 lety

    Nice visual explanation!

  • @tensorthug6802
    @tensorthug6802 Před 2 lety

    Thanks for the great explanation, This video helped me to get a job.

  • @mohammadyahya78
    @mohammadyahya78 Před rokem

    Hello Again,
    In FPN, when we merbe up-down path layers all the way down, you said we need to multiply each layer with 1x1x256 to produce a layer that is of the same dimension as the bottom layer in the bottom-up path for merging at 8:50. In your example at 9:42, how we can add 56x56x256 with 28x28x256 please?
    Another example, at 18:41, how we add 16x16x256 with 32x32x256 given both are of different dimensions?

  • @adityanjsg99
    @adityanjsg99 Před 2 lety

    Thank you Aarohi, this was just what I was looking for!

  • @sarvatmir5888
    @sarvatmir5888 Před 2 lety

    Amazing explanation

  • @fpgamachine
    @fpgamachine Před 3 lety

    Excellent thanks!

  • @krishnamohan9040
    @krishnamohan9040 Před 3 lety +1

    Excellent explanation I have ever seen. Thank you so much. If you don't mind can you explain the efficientdet architecture also in your upcoming videos?

  • @mehnaztabassum1878
    @mehnaztabassum1878 Před 3 lety

    Really worthy Explanation! Could you please prepare a video for small object detection?

  • @AdnanMunirkhokhar
    @AdnanMunirkhokhar Před rokem

    good explanation.

  • @mohammadyahya78
    @mohammadyahya78 Před rokem

    Thank you very much. Do you have a didicated video for regressr and classifer as I see them in most object detection videos on your channel?

  • @shreshthasingh8918
    @shreshthasingh8918 Před rokem

    why feature maps channel size is made same for addition ?

  • @mohammadyahya78
    @mohammadyahya78 Před rokem

    Thank you very much. You mentioned at 1:44 that the last layer will have the most useful feature map (though blurred as it looks), may I know why the last layer should have the most useful features please? Is it because it will get the first gradient update that vanishes as the gradients flows backward?

    • @CodeWithAarohi
      @CodeWithAarohi  Před rokem

      In Feature Pyramid Networks (FPN), the last layer of the pyramid (i.e. the highest-resolution layer) is typically considered to have the most useful features for object detection tasks. This is because the highest-resolution layer provides fine-grained details about the objects in the image, which are essential for accurate object detection and semantic segmentation tasks.
      The reason the highest-resolution layer may be considered "blurred" is that, as you move up the pyramid from lower-resolution to higher-resolution layers, the feature maps become coarser and contain less detailed information about the objects in the image. However, the highest-resolution layer still contains important information about the objects' shapes and sizes, which is crucial for accurate object detection.

  • @prashanthsheri4926
    @prashanthsheri4926 Před 2 lety

    well explained object detection algorithm

  • @heloone4453
    @heloone4453 Před 2 lety

    amazing . God bless

  • @pankajray5939
    @pankajray5939 Před 3 lety

    Nice work mam...waiting for yolo's video's

    • @CodeWithAarohi
      @CodeWithAarohi  Před 3 lety

      Videos on yolo v3- czcams.com/video/k7B2ZqffDRE/video.html
      czcams.com/video/xtn5D7yXF-4/video.html
      And will make video on yolo v4 soon

  • @rupakdey6753
    @rupakdey6753 Před 3 lety

    Excellent explanation mam.Please upload videos on Big Data

    • @CodeWithAarohi
      @CodeWithAarohi  Před 3 lety

      Will surely Do but first of all I want to finish my Pipelined Videos.

  • @srighakollapuajith4015

    demo is very good and the explanation is super and understandable, Mam you have not discussed about the padding and slide size at 11:45

    • @CodeWithAarohi
      @CodeWithAarohi  Před 3 lety

      Glad my video is helpful and padding , size not discussed here because that part I have discussed when I made video on resnet

  • @hafsayousif2474
    @hafsayousif2474 Před 3 lety

    how to train model FPN

  • @ayarzuki
    @ayarzuki Před 3 lety +2

    Please turn on auto subtitle

  • @vernobsarma7840
    @vernobsarma7840 Před 2 lety

    Ma'am can you provide us with the slides

  • @madhusudanverma6564
    @madhusudanverma6564 Před 3 lety

    Mam I have 3 doubts
    1.for addition of feature maps only depth of feature maps should be same or length and breadth also?
    2.why 2x is applied in top down
    3.Why 3*3 is applied to feature maps in top down?

    • @CodeWithAarohi
      @CodeWithAarohi  Před 3 lety

      Answers- 1- only depth of feature maps (channels) should be of same length.
      2- 2x means we are upsampling the image by 2x ( means improving the resoultion of image by 2x).
      3- Applying 3 × 3 convolution to reduces the aliasing effect when merged with the upsampled layer.

  • @zainhassan9508
    @zainhassan9508 Před rokem

    its very interesting ......................where we can get PPTs??

  • @CrackGate2025
    @CrackGate2025 Před 4 měsíci

    can i get this ppt and reference notes for this topic ma'am ?

  • @Mr.Esmaily
    @Mr.Esmaily Před 3 lety

    Thx for clear explanation, I'm trying to understand Mask_RCNN, by now i know how to implement Resnet for the backbone, it would be really great if you explain how to implement FPN by code examples.

  • @mehnaztabassum1878
    @mehnaztabassum1878 Před 3 lety +1

    Could you pls show the step by step implementation of FPN (incorporated with Faster RCNN)?

  • @safaalbdeary2966
    @safaalbdeary2966 Před 2 lety

    it's a really excellent explanation and nice content .. Can you please provide the code? thank you so much

    • @CodeWithAarohi
      @CodeWithAarohi  Před 2 lety

      Thankyou for liking my content ! Code is not uploaded yet .

    • @safaalbdeary2966
      @safaalbdeary2966 Před 2 lety

      @@CodeWithAarohi it's an interesting subject, I'm waiting for the code

  • @user-ew8dl5wr9t
    @user-ew8dl5wr9t Před 4 měsíci

    hii , please make a vedio on siamese network

  • @mehnaztabassum1878
    @mehnaztabassum1878 Před 3 lety +1

    Could you please provide the code?

  • @pranavpatel6786
    @pranavpatel6786 Před rokem

    Thank you

  • @martymcfly695
    @martymcfly695 Před 2 lety

    Please turn on auto subtitle