Bounding Box Prediction | Yolo | Essentials of Object Detection

Kapil Sachdeva

zhlédnutí 8 193

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 22. 12. 2022
This tutorial explains finer details about the bounding box coordinate predictions using visual cues.
Věda a technologie

Komentáře • 36

@sashalyuklyan5195 Před měsícem ⁺¹
Thank you a lot for your videos! Selection of subjects in your series is excellent, every tutorial offers very interesting information.
@KapilSachdeva Před měsícem
🙏
@mohammadyahya78 Před rokem ⁺¹
Indeed, this is the best explanation so far on this topic. Hopefully you can explain other concepts in YOLO after anchor box such as what is t_o?
@KapilSachdeva Před rokem
🙏 Yes, working on those.
@sheerazahmad3131 Před rokem ⁺³
Just completed this series. This is gold❤. Thank you so much❤️
Are you working on new videos as well?
@KapilSachdeva Před rokem
I have plans for it; have been busy lately and hence the delay. 🙏
@sheerazahmad3131 Před rokem
@@KapilSachdeva Got it. I will be waiting out for more content
@dinoyjohny5211 Před 3 měsíci
Thankyou so much
@wiputtuvaynond761 Před 9 měsíci
Thank you very much for excellent explanation how it work. I have the question. As you have described in anchor box video, the only one predefined anchor box is firstly selected based on IoU threshold and the prediction is learnt by that anchor. Is it neccessary that anchor box sizes are prior determined corresponding to the known ground truth objects?.
@KapilSachdeva Před 9 měsíci
More or less yes.
@johngrabner Před 11 měsíci ⁺¹
You are an excellent instructor, thank you. The rationale for multiple priors per cell is not explained. Unlikely to be inconsequential since they are common in object detection. I suspect that it helps in multiple ways. For example: Neural network will smear its objectiveness assessment over a number of cells. If multiple objects are present in a picture, then the multiple object's smears will overlap (superposition). By having multiple priors per cell, this smearing can be diluted, essentially a hash to separate predictions do the superposition does not go above the detection threshold. I also suspect that the encoded ground truth of height and width are easier to learn by using these multiple priors per cells. Does this make sense and are there more ways that multiple priors per cell help the trainability of these models?
@KapilSachdeva Před 11 měsíci
Reason for multiple priors per cell is to incorporate the different size of objects belonging to different classes. And as such even for a single class it is not bad idea to use different scales and aspect rations.
@johngrabner Před 11 měsíci
@@KapilSachdeva I suppose I am looking for why it is a good idea to have multiple aspect ratios per cell. If I have a sufficient number of cells, then unlikely for two objects to be perfectly aligned with that cell. So I suspect it is not to disambiguate multiple perfectly overlapped objects. Setting the prior to say each cell has one object that is full page, and let the ground truth train the actual value. I suspect this will not work as good as multiple aspect ratios in each cell. if so, why?
@KapilSachdeva Před 11 měsíci
> Setting the prior to say each cell has one object that is full page, and let the ground truth train the actual value.
Not sure if I understand above line.
Regarding disambiguate multiple perfectly overlapped objects -
The objects may very well overlap in the "feature map". A cell in the feature map (grid) covers many pixels in the original image so it is possible that a given cell contains the center of multiple objects.
@johngrabner Před 11 měsíci
@@KapilSachdeva good point, thank you
@mohammadyahya78 Před rokem
If we have 5 priori bounding boxes, we will do the calculations b_x, b_y, b_w, b_h 5 times and then find IoU between them and ground truth bounding boxes?
@KapilSachdeva Před rokem ⁺¹
Yes.
@issanasralli4529 Před rokem ⁺¹
thanks a lot!!! good video!!
Please, when you will explain to us how updating height and width bw=pw*e(tw)
@KapilSachdeva Před rokem
🙏
@lordfarquad-by1dq Před rokem ⁺¹
hello , when are you going to publish new videos? this series is really helpful, subbed and looking forward for the new vids
@KapilSachdeva Před rokem ⁺¹
Very soon. Apologies for the delay.
@lordfarquad-by1dq Před rokem
@@KapilSachdeva thank you and looking forward to it , would you recommend any books that extensively go through these topics ?
@KapilSachdeva Před rokem ⁺¹
I am not aware of any book. Even if it exists it will get outdated just after it is published.
The best way to learn is to first read the papers and then read the official code and try to implement by yourself.
The most challenging aspect in this domain (object detection) is that even though the people (researchers) are brilliant they are not good at software development. This makes their code very hard to read. This is why as part of these tutorials I am showing some code as well.
@lordfarquad-by1dq Před rokem
@@KapilSachdeva thank you and looking forward for more of your tutorials
@indianmanhere Před rokem ⁺¹
Sir question is how boundary boxes are made like how a cat man cycle all locations and boxes are made at a single time, i know how to select the best one and to write feature vectors....
Please kindly respond
@KapilSachdeva Před rokem
Just to confirm -
Are you asking how ground truth tensors are prepared so that we can pass them in the loss function along with the predictions? And the ground truth tensor are prepared in such a way that label assignment is done for all them?
@indianmanhere Před rokem ⁺¹
@@KapilSachdeva i mean how Boundary boxes are made for every objects in a image
Like when we are writing vectors for every grid which consists of probability of class and coordinates, how they(coordinates/boundary boxes) are calculated using which algorithm
@mohammadyahya78 Před rokem
Why we don't use the bounding box prior in the calculations of b_x and b_y
in 7:29 please?
@KapilSachdeva Před rokem ⁺¹
We are using the bounding box prior. cx and cy are that of prior bounding box that we have associated with the ground truth box.
@mohammadyahya78 Před rokem
@@KapilSachdeva Thanks! What about p_x and p_y please then? I though these are the prior bounding box coordinates
@KapilSachdeva Před rokem ⁺¹
First it is not p_x and p_y. It is p_w and p_h i.e. width and height of the prior box.
A bounding box is defined using (cx,cy,pw,ph)
@mohammadyahya78 Před rokem
@@KapilSachdeva Thank you. Lastly, I was trying to say why we don't use p_x and p_y in the calculations of b_x and b_y instead c_x and c_y please?
@KapilSachdeva Před rokem ⁺¹
In yolo, bounding box is defined using (cx,cy,w,h)
@rampavanmedipelli6152 Před rokem ⁺¹
Please explain the NECKS part
@KapilSachdeva Před rokem
Will do. Stuck at few problems and hence not getting any cycles. But will definitely get to them.
@KapilSachdeva Před rokem
done 😀

Další v pořadí

Automatické přehrávání

Feature Pyramid Network | Neck | Essentials of Object Detection