Fast R-CNN Explained | ROI Pooling

Sdílet
Vložit
  • čas přidán 13. 09. 2024

Komentáře • 12

  • @kelvin-uh7tf
    @kelvin-uh7tf Před 3 měsíci

    nice.

  • @thangphu6044
    @thangphu6044 Před 2 měsíci

    Do RCCN and Fast RCNN sample into batches before pass to CNN?

    • @Explaining-AI
      @Explaining-AI  Před 2 měsíci

      For RCNN yes. For Fast RCNN it doesn't really matter(whether sampling is done before or after) because we only need to do one forward pass through CNN per image. And then ROI pooling on this feature map is done only for the sampled proposals of the batch.

    • @thangphu6044
      @thangphu6044 Před 2 měsíci

      @@Explaining-AI Ah! I mean that all proposals are sampled into batches before feeding to Classifier and BBox Regression? (Whether RCNN takes all proposals one by one and Fast RCNN takes batch by batch?)

    • @Explaining-AI
      @Explaining-AI  Před 2 měsíci

      @@thangphu6044 I am not sure I understand what you are referring to by "sampled into batches". I have added a bit of detail for training of both, Can you please tell me which sampling you are referring to here.
      For RCNN:
      1. We Fetch all proposals for our dataset(2K proposals x number of images)
      2. For each training iteration, we sample 128 proposals, this becomes our batch.
      3. This batch is fed to CNN(with fc6 and fc7) to get 128 x 4096 features
      4. These are then passed to a 4096 x N_Classes classification layer to get logits for all 128 proposals.
      5. And then we also do the SVM and bbox regression training
      For Fast RCNN:
      1. Each training iterations takes 2 images.
      2. We pass this batch of two images to CNN to get feature maps for the batch.
      3. We sample N proposals from each of these two images
      4. ROI Pooling is done to get 2N x 512 x 7 x7 output = 2N x 25088.
      5. Then these 2N x 25088 is passed to fc6, fc7 and classification layers to get 2N x N_classes outputs
      For inference, there is no sampling. So example for Fast RCNN inference, step 4 is done for all 2000 proposals to get 2000 x 25088 output

    • @thangphu6044
      @thangphu6044 Před 2 měsíci

      @@Explaining-AI It's step 2 in RCNN and step 3 in Fast RCNN. (In video, you talked about sampling some non-background hay background then feed to the FC layers)

    • @Explaining-AI
      @Explaining-AI  Před 2 měsíci

      @@thangphu6044 yes. during training, given an image, we only pass a sample of proposals(their features actually) to the fc layer. Like you can see for rcnn , we only use 128(out of 2000) , and same for fast rcnn.

  • @anshumansinha5874
    @anshumansinha5874 Před 2 měsíci

    Hi, I couldn't understand the training of the RPN. I get that we have the ground truth proposals from mapping original bounding boxes to final latent convolution output. But that's only the ground truth, what does the CNN network predict in itself so that we can compare it with the ground truth?
    @14:55 you did mention that the input to the network are images and a list of proposals, are these random initial proposals which go through the network and will get penalised at the output in order for the model to learn the correct proposals? if they are random initial list of proposals, then how to know the number of proposals beforehand during initialisation? Or that we can take as a prior information from the actual number of ground truth boxes?

    • @Explaining-AI
      @Explaining-AI  Před 2 měsíci

      For Fast RCNN there is no RPN. Thats actually introduced in Faster RCNN(next video in this series in which I get into RPN and its training).
      Fast RCNN still uses selective search proposals. Its just that unlike RCNN where we feed every selective search proposal to CNN separately(2000 forward passes), we now feed the image only once and use ROI pooling to get features for those proposals.

    • @Explaining-AI
      @Explaining-AI  Před 2 měsíci

      For Fast RCNN there is no RPN. Thats actually introduced in Faster RCNN(next video in this series in which I get into RPN and its training).
      Fast RCNN still uses selective search proposals. Its just that unlike RCNN where we feed every selective search proposal to CNN separately(2000 forward passes), we now feed the image only once and use ROI pooling to get features for those proposals.