L11.6 Xavier Glorot and Kaiming He Initialization

Sdílet
Vložit
  • čas přidán 10. 03. 2021
  • IMPORTANT NOTE: In the video, I talk about the number of input units in the denominator ("fan in"), but to be correct, it should have been number of input units for both the current and the next layer ("fan in" + "fan out").
    Slides: sebastianraschka.com/pdf/lect...
    Papers:
    Xavier Glorot and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." Proceedings of the thirteenth international conference on artificial intelligence and statistics. 2010. proceedings.mlr.press/v9/gloro...]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." In Proceedings of the IEEE international conference on computer vision, pp. 1026-1034. 2015. arxiv.org/abs/1502.01852
    -------
    This video is part of my Introduction of Deep Learning course.
    Next video: • L11.7 Weight Initializ...
    The complete playlist: • Intro to Deep Learning...
    A handy overview page with links to the materials: sebastianraschka.com/blog/202...
    -------
    If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka
  • Věda a technologie

Komentáře • 7

  • @mahdimoshtaghi9903
    @mahdimoshtaghi9903 Před rokem +3

    The terms fan- in and fan-out come from Digital Electronics . Fan-in is the max number of logic Gates can be connected to the input of a particular Gate. Fan-out is the same to the output.

  • @hamzamohiuddin973
    @hamzamohiuddin973 Před rokem

    At 6:12 on the second line of equations, the part which is marked by blue circle, can someone please clarify how the variance of the product of 2 independent variables can be expanded to the product of the variances of those variables? I can't seem to find any such property...can someone point to some helpful material . Thank you

  • @hamzamohiuddin973
    @hamzamohiuddin973 Před rokem +1

    at 5:46 should the summation iterator variable be 'k' instead of 'j'?

  • @sightreader2507
    @sightreader2507 Před 3 lety +5

    I think you are not describing Xavier initialization. Xavier initialization is equations (16) in the paper. Equation (1) is what you are showing with only fan_in, and this is what they argue was a common but bad heuristic

    • @SebastianRaschka
      @SebastianRaschka  Před 3 lety +1

      Thanks for the note, you are right. Wasn't careful here. Will make a note to fix that.

    • @gramlin17
      @gramlin17 Před 3 lety

      thanks for the mention. I was wondering after reading the paper why no one talks about equations (16). I must say I see so many different interpretations that I am totally confused. Also, where does He take into account the nonlinearity of ReLU? We see sqrt in both formulas ... the multiplication by 2 is due to the fact that ReLU cuts off half below 0, right?

    • @ewankenobi22
      @ewankenobi22 Před 3 měsíci

      I noticed that too. Shame as I would like to understand better where the root 6 comes from in the actual Xavier initialisation equation