L11.7 Weight Initialization in PyTorch -- Code Example

Xavier/Glorat And He Weight Initialization in Deep Learning

Graph Neural Networks - a perspective from the ground up

Beautiful game!😍

Who has won ?? 😀 #shortvideo #lizzyisaeva

Growing An Ear In Your Arm 😨

L11.6 Xavier Glorot and Kaiming He Initialization

Sebastian Raschka

zhlédnutí 11 152

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 10. 03. 2021
IMPORTANT NOTE: In the video, I talk about the number of input units in the denominator ("fan in"), but to be correct, it should have been number of input units for both the current and the next layer ("fan in" + "fan out").
Slides: sebastianraschka.com/pdf/lect...
Papers:
Xavier Glorot and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." Proceedings of the thirteenth international conference on artificial intelligence and statistics. 2010. proceedings.mlr.press/v9/gloro...]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." In Proceedings of the IEEE international conference on computer vision, pp. 1026-1034. 2015. arxiv.org/abs/1502.01852
-------
This video is part of my Introduction of Deep Learning course.
Next video: • L11.7 Weight Initializ...
The complete playlist: • Intro to Deep Learning...
A handy overview page with links to the materials: sebastianraschka.com/blog/202...
-------
If you want to be notified about future videos, please consider subscribing to my channel: / sebastianraschka
Věda a technologie

Komentáře • 7

@mahdimoshtaghi9903 Před rokem ⁺³
The terms fan- in and fan-out come from Digital Electronics . Fan-in is the max number of logic Gates can be connected to the input of a particular Gate. Fan-out is the same to the output.
@hamzamohiuddin973 Před rokem
At 6:12 on the second line of equations, the part which is marked by blue circle, can someone please clarify how the variance of the product of 2 independent variables can be expanded to the product of the variances of those variables? I can't seem to find any such property...can someone point to some helpful material . Thank you
@hamzamohiuddin973 Před rokem ⁺¹
at 5:46 should the summation iterator variable be 'k' instead of 'j'?
@sightreader2507 Před 3 lety ⁺⁵
I think you are not describing Xavier initialization. Xavier initialization is equations (16) in the paper. Equation (1) is what you are showing with only fan_in, and this is what they argue was a common but bad heuristic
@SebastianRaschka Před 3 lety ⁺¹
Thanks for the note, you are right. Wasn't careful here. Will make a note to fix that.
@gramlin17 Před 3 lety
thanks for the mention. I was wondering after reading the paper why no one talks about equations (16). I must say I see so many different interpretations that I am totally confused. Also, where does He take into account the nonlinearity of ReLU? We see sqrt in both formulas ... the multiplication by 2 is due to the fact that ReLU cuts off half below 0, right?
@ewankenobi22 Před 3 měsíci
I noticed that too. Shame as I would like to understand better where the root 6 comes from in the actual Xavier initialisation equation

Další v pořadí

Automatické přehrávání

L11.7 Weight Initialization in PyTorch -- Code Example

L11.7 Weight Initialization in PyTorch -- Code Example

Xavier/Glorat And He Weight Initialization in Deep Learning

Xavier/Glorat And He Weight Initialization in Deep Learning

Graph Neural Networks - a perspective from the ground up

Graph Neural Networks - a perspective from the ground up

Beautiful game!😍

Beautiful game!😍

Who has won ?? 😀 #shortvideo #lizzyisaeva

Who has won ?? 😀 #shortvideo #lizzyisaeva

Growing An Ear In Your Arm 😨

Growing An Ear In Your Arm 😨

🍴 Join Puff as he whips up the fluffiest pancakes ever! #PancakeDay

🍴 Join Puff as he whips up the fluffiest pancakes ever! #PancakeDay

Weight Initialization explained | A way to reduce the vanishing gradient problem

Weight Initialization explained | A way to reduce the vanishing gradient problem

L2.1 Artificial Neurons

L2.1 Artificial Neurons

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

The challenges in Variational Inference (+ visualization)

The challenges in Variational Inference (+ visualization)

Методы инициализации весов

Методы инициализации весов

L13.4 Convolutional Filters and Weight-Sharing

L13.4 Convolutional Filters and Weight-Sharing

But what is a neural network? | Chapter 1, Deep learning

But what is a neural network? | Chapter 1, Deep learning

CrowdStrike IT Outage Explained by a Windows Developer

CrowdStrike IT Outage Explained by a Windows Developer

ЧТО ЭТО За Флешки Замурованные в СТЕНЕ? #shorts

ЧТО ЭТО За Флешки Замурованные в СТЕНЕ? #shorts

3 Hilarious North Korean Smartphone Features

3 Hilarious North Korean Smartphone Features

NOVÉ SAMSUNGY 😅

NOVÉ SAMSUNGY 😅

Some bad code just broke a billion Windows machines

Some bad code just broke a billion Windows machines

Todos os modelos de smartphone

Todos os modelos de smartphone

Samsung Z Flip 6 Durability Test - I CANT BELIEVE THIS WORKED...

Samsung Z Flip 6 Durability Test - I CANT BELIEVE THIS WORKED...

Best upgrade for an old #gaming laptop! (HP Pavilion Gaming #pc) #tech #technology #shorts

Best upgrade for an old #gaming laptop! (HP Pavilion Gaming #pc) #tech #technology #shorts