NLP Demystified 11: Essential Training Techniques for Neural Networks
Vložit
- čas přidán 2. 08. 2024
- Course playlist: • Natural Language Proce...
In our previous deep dive into neural networks, we looked at the core mechanisms behind how they learn. In this video, we'll explore all the additional details when it comes to effectively training them.
We'll look at how to converge faster to a minimum, when to use certain activation functions, when and how to scale our features, and what deep learning is ultimately about.
We'll also apply our knowledge by building a simple deep learning model for text classification, and this will mark our return to NLP for the rest of the course.
Colab notebook: colab.research.google.com/git...
Timestamps
00:00:00 Neural Networks II
00:01:09 Mini-batch stochastic gradient descent
00:03:55 Finding an effective learning rate
00:06:15 Using a learning schedule
00:07:35 Complex loss surfaces and local minima
00:09:12 Adding momentum to gradient descent
00:12:50 Adaptive optimizers (RMSProp and Adam)
00:15:08 Local minima are rarely a problem
00:15:21 Activation functions (sigmoid, tanh, and relu)
00:19:35 Weight initialization techniques (Xavier/Glorot and He)
00:21:25 Feature scaling (normalization and standardization)
00:23:28 Batch normalization for training stability
00:28:26 Regularization (early stopping, L1, L2, and dropout)
00:33:11 DEMO: building a basic deep learning model for NLP
00:56:19 Deep learning is about learning representations
00:58:18 Sensible defaults when building deep learning models
This video is part of Natural Language Processing Demystified --a free, accessible course on NLP.
Visit www.nlpdemystified.org/ to learn more.
Timestamps
00:00:00 Neural Networks II
00:01:09 Mini-batch stochastic gradient descent
00:03:55 Finding an effective learning rate
00:06:15 Using a learning schedule
00:07:35 Complex loss surfaces and local minima
00:09:12 Adding momentum to gradient descent
00:12:50 Adaptive optimizers (RMSProp and Adam)
00:15:08 Local minima are rarely a problem
00:15:21 Activation functions (sigmoid, tanh, and relu)
00:19:35 Weight initialization techniques (Xavier/Glorot and He)
00:21:25 Feature scaling (normalization and standardization)
00:23:28 Batch normalization for training stability
00:28:26 Regularization (early stopping, L1, L2, and dropout)
00:33:11 DEMO: building a basic deep learning model for NLP
00:56:19 Deep learning is about learning representations
00:58:18 Sensible defaults when building deep learning models
Your course is gold!
Thank so much for your work on this series. Just a note, you need at least 10GB of free ram for the notebook to complete without crashing. (At least on my machine). Good idea to close any unnecessary programs before running it.
Thank you for sharing this tip. Yep, I see a spike to about 4.5GB of RAM on the free Colab tier the first time the bag-of-words is converted to a sparse tensor before being garbage-collected. This cell here: bit.ly/3cMh7of.
Thank you !!!
Many thanks for this excellent video.
Great lectures.
It was ONLY your video that made me realize that it will take me years to study AI programming. And by the time those years have passed, I will again have to learn new variants and models and familiarize myself with new Python libraries and modules. Will I ever catch up? Thanks for "revealing the secrets". How about an interactive tutorial where a student inputs different variables and watch how Python responds? 56:01
Why 10 years? Seems to be unnecessary
You kind of sound like Casually Explained