Focal Loss for Dense Object Detection

Sdílet
Vložit
  • čas přidán 17. 07. 2024
  • ICCV17 | 1902 | Focal Loss for Dense Object Detection
    Tsung-Yi Lin (Cornell), Priya Goyal (Facebook AI Research), Ross Girshick (Facebook), Kaiming He (Facebook AI Research), Piotr Dollar (Facebook AI Research)
    The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors.
  • Věda a technologie

Komentáře • 31

  • @autripat
    @autripat Před 6 lety +38

    Lovely presentation; states the problem clearly (class imbalance for dense boxes) and the solution just as clearly (modulating the cross-entropy loss towards the hard examples). Brilliant solution too!

    • @shubhamprasad6910
      @shubhamprasad6910 Před 3 lety

      just want to clarify hard examples here mean FP and FN while training.

    • @raphaeldesmond4736
      @raphaeldesmond4736 Před 2 lety

      you all prolly dont care at all but does anybody know a way to log back into an instagram account..?
      I somehow lost the password. I appreciate any tips you can offer me!

    • @casonkade9316
      @casonkade9316 Před 2 lety

      @Raphael Desmond instablaster ;)

  • @bobsalita3417
    @bobsalita3417 Před 6 lety

    Paper: arxiv.org/abs/1708.02002

  • @XX-vu5jo
    @XX-vu5jo Před 3 lety +3

    I wish i could work with these people someday.

  • @user-fk1wo2ys3b
    @user-fk1wo2ys3b Před 3 lety

    Great report!

  • @larryliu2298
    @larryliu2298 Před 6 lety +1

    great work!

  • @andrijanamarjanovic2212

    Bravo!

  • @manan4436
    @manan4436 Před 5 lety +1

    woww Simple anaylisis leads to best perfomance...
    Think different.

  • @talha_anwar
    @talha_anwar Před 4 lety

    Is he talking about binary cross entropy

  • @punithavalli824
    @punithavalli824 Před 3 lety +2

    It was really great work. I am very curious about the αt term and α - balance factor, can you please help you to get some clarity about α and αt. it will be a great help for my studies

    • @punithavalli824
      @punithavalli824 Před 3 lety +1

      Hope for your reply

    • @luansouzasilva31
      @luansouzasilva31 Před měsícem

      It is an idea from balanced cross-entropy, they just brought it up. Datasets with 2 or more classes usually have a class imbalance. This is a problem because the networks tend to focus on majority data, getting poor learning over the minority ones. So the idea of alpha is to put weight on the loss so that majority classes have less impact than minority classes. Alpha can be thought of as the "inverse frequency" of class distribution in the dataset.
      Example: if you have 100 dogs (class 0) and 900 cats (class 1), the distribution is 10% for dogs and 90% for cats. So the inverse frequency would be 1 - 0.1 = 0.9 for dogs, and 1 - 0.9 = 0.1 for cats. It means that alpha_dogs = 0.9 and alpha_cats = 0.1.
      In binary classification the alpha is thought of as the "weight for positive classes", so the weight for negative classes would be 1 - alpha. For the above problem, alpha=alpha_cats, as cats represent the positive class. However, for multiclass classification, the alpha is a vector with a length corresponding to the number of classes.

  • @lexiegao4085
    @lexiegao4085 Před 2 lety +4

    He is cuteeee!!!!!

  • @donghunpark379
    @donghunpark379 Před 5 lety

    5:35 On which reason he pointed 2.3(Left) and 0.1(right)?? The point is meaningful? or just example. If it is example how can he say hard example affect 40x bigger loss then easy example like a general case. It's strange.

    • @haileyliu9602
      @haileyliu9602 Před 4 lety +2

      He is trying to say that a hard example only impacts loss 20 times bigger than the easy example. So with the setting of the dense detector, where hard examples : easy examples is 1 : 1000, then the loss of hard examples : the loss of easy examples is 2.3 : 100. This means the loss is overwhelmed by the easy examples.

    • @amaniarman460
      @amaniarman460 Před 3 lety

      Thanks for this

  • @lastk7235
    @lastk7235 Před měsícem

    TY is cool

  • @caiyu538
    @caiyu538 Před rokem

    what is the definition of easy train or hard train datasets?

    • @luansouzasilva31
      @luansouzasilva31 Před měsícem +2

      Easy examples are those that the model quickly learns how to correctly predict. In the context of object detection, you can think of it as big objects, having a unique shape (low chance of confusing it with other objects), etc. Hard examples are those that have high similarity or are too small in the images. Detecting an airplane and differentiating it from a bottle (easy) is more suitable than detecting and differentiating a dog from a wolf (hard).
      Based on this context, during the learning process is expected that the model quickly learns the easy examples, meaning that its probabilities will be close to 1 for positive examples. The factor (1-pt) modularizes it. As pt is close to 1 the factor (1-pt) get close to zero, then reducing the loss value. Semantically it can be seen as "reducing the impact of easy examples". The factor gamma just tells how intense is this modularization.

    • @caiyu538
      @caiyu538 Před měsícem

      @@luansouzasilva31 thanks

  • @MrAe0nblue
    @MrAe0nblue Před 4 lety +3

    I feel like he had a small prank hidden in his talk. As a deep learning expert at Google Brain, the one word he should know better than any other would be the word "classify", yet he stumbles on it multiple times. But oddly enough, only that word. Clearly, those that work at Google Brain are some of the brightest most talented (I'm not trying to pick on him). That's why that must be a prank right!? Or maybe he was just a bit nervous.

  • @k.z.982
    @k.z.982 Před 3 lety

    this guy is reading...

    • @slime67
      @slime67 Před 2 lety +2

      you won't believe, but every TV presenter does exactly this, - nobody wants to slip up while presenting their thoughts to the large audience
      modern cameras have some tricky mirror system which allows you to read the text while at the same time looking at camera and apparently, it's not the case here :)

  • @k.z.982
    @k.z.982 Před 3 lety

    apparently he does not know what he's talking about

  • @hmmhuh3263
    @hmmhuh3263 Před 2 lety +1

    really bad presenter but great idea