![Serrano.Academy](/img/default-banner.jpg)
- 54
- 6 553 153
Serrano.Academy
United States
Registrace 23. 09. 2013
Welcome to Serrano.Academy! I'm Luis Serrano and I love demystifying concepts, capturing their essence, and sharing these videos with you. I prefer illustrations, analogies, and cartoons, rather than formulas (although we don't shy away from the math when needed).
The topics I have are machine learning, mathematics (probability and statistics), but I'm open to many others. If you have any topics you'd like to suggest, feel free to add them in the comments or drop me a line!
For more information, check out serrano.academy.
And also check out my book! Grokking Machine Learning
manning.com/books/grokking-machine-learning
(40% discount code: serranoyt)
The topics I have are machine learning, mathematics (probability and statistics), but I'm open to many others. If you have any topics you'd like to suggest, feel free to add them in the comments or drop me a line!
For more information, check out serrano.academy.
And also check out my book! Grokking Machine Learning
manning.com/books/grokking-machine-learning
(40% discount code: serranoyt)
Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning
Direct Preference Optimization (DPO) is a method used for training Large Language Models (LLMs). DPO is a direct way to train the LLM without the need for reinforcement learning, which makes it more effective and more efficient.
Learn about it in this simple video!
This is the third one in a series of 4 videos dedicated to the reinforcement learning methods used for training LLMs.
Full Playlist: czcams.com/play/PLs8w1Cdi-zvYviYYw_V3qe6SINReGF5M-.html
Video 0 (Optional): Introduction to deep reinforcement learning czcams.com/video/SgC6AZss478/video.html
Video 1: Proximal Policy Optimization czcams.com/video/TjHH_--7l8g/video.html
Video 2: Reinforcement Learning with Human Feedback czcams.com/video/Z_JUqJBpVOk/video.html
Video 3 (This one!): Deterministic Policy Optimization
00:00 Introduction
01:08 RLHF vs DPO
07:19 The Bradley-Terry Model
11:25 KL Divergence
16:32 The Loss Function
14:36 Conclusion
Get the Grokking Machine Learning book!
manning.com/books/grokking-machine-learning
Discount code (40%): serranoyt
(Use the discount code on checkout)
Learn about it in this simple video!
This is the third one in a series of 4 videos dedicated to the reinforcement learning methods used for training LLMs.
Full Playlist: czcams.com/play/PLs8w1Cdi-zvYviYYw_V3qe6SINReGF5M-.html
Video 0 (Optional): Introduction to deep reinforcement learning czcams.com/video/SgC6AZss478/video.html
Video 1: Proximal Policy Optimization czcams.com/video/TjHH_--7l8g/video.html
Video 2: Reinforcement Learning with Human Feedback czcams.com/video/Z_JUqJBpVOk/video.html
Video 3 (This one!): Deterministic Policy Optimization
00:00 Introduction
01:08 RLHF vs DPO
07:19 The Bradley-Terry Model
11:25 KL Divergence
16:32 The Loss Function
14:36 Conclusion
Get the Grokking Machine Learning book!
manning.com/books/grokking-machine-learning
Discount code (40%): serranoyt
(Use the discount code on checkout)
zhlédnutí: 2 247
Video
KL Divergence - How to tell how different two distributions are
zhlédnutí 3,3KPřed dnem
Correction (10:26). The probabilities are wrong. The correct ones are here: For Die 1: 0.4^4 * 0.2^2 * 0.1^1 * 0.1^1 * 0.2^2 For Die 2: 0.4^4 * 0.1^2 * 0.2^1 * 0.2^1 * 0.1^2 For Die 3: 0.1^4 * 0.2^2 * 0.4^1 * 0.2^1 * 0.1^2 Kullback Leibler (KL) divergence is a way to measure how far apart two distributions are. In this video, we learn KL-divergence in a simple way, using a probability game with...
Why do we divide by n-1 to estimate the variance? A visual tour through Bessel correction
zhlédnutí 11KPřed měsícem
Correction: At 30:42 I write "X = Y". They're not equal, what I meant to say is "X and Y are identically distributed". The variance is a measure of how spread out a distribution is. In order to estimate the variance, one takes a sample of n points from the distribution, and calculate the average square deviation from the mean. However, this doesn't give a good estimate of the variance of the di...
Reinforcement Learning with Human Feedback - How to train and fine-tune Transformer Models
zhlédnutí 8KPřed 4 měsíci
Reinforcement Learning with Human Feedback (RLHF) is a method used for training Large Language Models (LLMs). In the heart of RLHF lies a very powerful reinforcement learning method called Proximal Policy Optimization. Learn about it in this simple video! This is the first one in a series of 3 videos dedicated to the reinforcement learning methods used for training LLMs. Full Playlist: czcams.c...
Proximal Policy Optimization (PPO) - How to train Large Language Models
zhlédnutí 18KPřed 5 měsíci
Reinforcement Learning with Human Feedback (RLHF) is a method used for training Large Language Models (LLMs). In the heart of RLHF lies a very powerful reinforcement learning method called Proximal Policy Optimization. Learn about it in this simple video! This is the first one in a series of 3 videos dedicated to the reinforcement learning methods used for training LLMs. Full Playlist: czcams.c...
Stable Diffusion - How to build amazing images with AI
zhlédnutí 17KPřed 6 měsíci
This video is about Stable Diffusion, the AI method to build amazing images from a prompt. If you like this material, check out LLM University from Cohere! llm.university Get the Grokking Machine Learning book! manning.com/books/grokking-ma... Discount code (40%): serranoyt (Use the discount code on checkout) 0:00 Introduction 1:27 How does Stable Diffusion work? 2:55 Embeddings 12:55 Diffusion...
What are Transformer Models and how do they work?
zhlédnutí 103KPřed 7 měsíci
This is the last of a series of 3 videos where we demystify Transformer models and explain them with visuals and friendly examples. Video 1: The attention mechanism in high level czcams.com/video/OxCpWwDCDFQ/video.html Video 2: The attention mechanism with math czcams.com/video/UPtG_38Oq8o/video.html Video 3 (This one): Transformer models If you like this material, check out LLM University from...
The math behind Attention: Keys, Queries, and Values matrices
zhlédnutí 214KPřed 10 měsíci
This is the second of a series of 3 videos where we demystify Transformer models and explain them with visuals and friendly examples. Video 1: The attention mechanism in high level czcams.com/video/OxCpWwDCDFQ/video.html Video 2: The attention mechanism with math (this one) Video 3: Transformer models czcams.com/video/qaWMOYf4ri8/video.html If you like this material, check out LLM University fr...
The Attention Mechanism in Large Language Models
zhlédnutí 83KPřed 11 měsíci
Attention mechanisms are crucial to the huge boom LLMs have recently had. In this video you'll see a friendly pictorial explanation of how attention mechanisms work in Large Language Models. This is the first of a series of three videos on Transformer models. Video 1: The attention mechanism in high level (this one) Video 2: The attention mechanism with math: czcams.com/video/UPtG_38Oq8o/video....
The Binomial and Poisson Distributions
zhlédnutí 10KPřed rokem
If on average, 3 people enter a store every hour, what is the probability that over the next hour, 5 people will enter the store? The answer lies in the Poisson distribution. In this video you'll learn this distribution, starting from a much simpler one, the Binomial distribution. Euler number video: czcams.com/video/oikl9FCISqU/video.html Grokking Machine Learning book: bit.ly/grokkingML 40% d...
Euler's number, derivatives, and the bank at the end of the universe
zhlédnutí 3,7KPřed rokem
Euler's number, e, is defined as a limit. The function e to the x is (up to multiplying by a constant) the only function that is its own derivative. How are these two related? In this video you'll find an explanation for this phenomenon using banking interest rates, and a very particular bank, located at the end of the universe.
Decision trees - A friendly introduction
zhlédnutí 11KPřed rokem
A video about decision trees, and how to train them on a simple example. Accompanying blog post: medium.com/@luis.serrano/splitting-data-by-asking-questions-decision-trees-74afed9cd849 For a code implementation, check out this repo: github.com/luisguiserrano/manning/tree/master/Chapter_9_Decision_Trees Helper videos: - Gini index: czcams.com/video/u4IxOk2ijSs/video.html - Entropy and informatio...
How do you minimize a function when you can't take derivatives? CMA-ES and PSO
zhlédnutí 8KPřed rokem
How do you minimize a function when you can't take derivatives? CMA-ES and PSO
Denoising and Variational Autoencoders
zhlédnutí 23KPřed 2 lety
Denoising and Variational Autoencoders
Eigenvectors and Generalized Eigenspaces
zhlédnutí 26KPřed 2 lety
Eigenvectors and Generalized Eigenspaces
Thompson sampling, one armed bandits, and the Beta distribution
zhlédnutí 21KPřed 2 lety
Thompson sampling, one armed bandits, and the Beta distribution
A friendly introduction to deep reinforcement learning, Q-networks and policy gradients
zhlédnutí 94KPřed 3 lety
A friendly introduction to deep reinforcement learning, Q-networks and policy gradients
The Gini Impurity Index explained in 8 minutes!
zhlédnutí 38KPřed 3 lety
The Gini Impurity Index explained in 8 minutes!
Singular Value Decomposition (SVD) and Image Compression
zhlédnutí 90KPřed 3 lety
Singular Value Decomposition (SVD) and Image Compression
ROC (Receiver Operating Characteristic) Curve in 10 minutes!
zhlédnutí 59KPřed 3 lety
ROC (Receiver Operating Characteristic) Curve in 10 minutes!
Restricted Boltzmann Machines (RBM) - A friendly introduction
zhlédnutí 63KPřed 3 lety
Restricted Boltzmann Machines (RBM) - A friendly introduction
A Friendly Introduction to Generative Adversarial Networks (GANs)
zhlédnutí 245KPřed 4 lety
A Friendly Introduction to Generative Adversarial Networks (GANs)
You are much better at math than you think
zhlédnutí 7KPřed 4 lety
You are much better at math than you think
Training Latent Dirichlet Allocation: Gibbs Sampling (Part 2 of 2)
zhlédnutí 53KPřed 4 lety
Training Latent Dirichlet Allocation: Gibbs Sampling (Part 2 of 2)
Latent Dirichlet Allocation (Part 1 of 2)
zhlédnutí 128KPřed 4 lety
Latent Dirichlet Allocation (Part 1 of 2)
Book by Luis Serrano - "Grokking Machine Learning" (40% off promo code)
zhlédnutí 14KPřed 4 lety
Book by Luis Serrano - "Grokking Machine Learning" (40% off promo code)
Can you please add a video on curse of dimensionality?
Thanks, the explaination is so intuitive. Finally understood the idea of attention.
Appreciate the great explanation. I have a question regarding the clipping formula at 36:42. You have used the "min" function. For example, if the rate is 0.4 and the epsilon is 0.3, indicating that we should get 0.7 in this scenario. However, in the formula you introduced here is returns then 0.4. Shouldn't the formula be clipped_f(x) = max(1 - epsilon, min(f(x), 1 + epsilon))? Am I missing anything?
Best explanation on the internet!!
Top
Very intuitive, thanks you. I like the exemple approach you take. 👏
Ye Be10x ko koi ban kardo please. Irritate kar diya hai.
Yess true. I only passed all my maths courses by learning by heart. Never quite satisfied with even good grades because I knew in my heart I understood nothing. Currently refreshing linear algebra in your coursera course and WOW! It’s addicting to actually learn what a rank in a matrix means. 😊☀️
Thank you for this amazing explanation <3
Great video as always. I have a question, in practice which one works best using DPO or RLHF?
Thank you! From what I've heard, DPO works better, as it trains the network directly instead of using RL and two networks.
@@SerranoAcademy Thank you sir for the great work. your Coursera courses have been awesome.
And the entropy is number of bits needed to convey the information.
Very good video, it helped clear some doubts I was having with this along with the Viterbi Algorithm. It's just too bad that the notation used was too different from class, but it did help me understand everything and make a connection between all of it. Thank you!
Hi Mr. Serrano! I am doing your coursera course at the moment on linear algebra for machine learning and I am having so much fun! You are a brilliant teacher, and I just wanted to say thank you! Wish more teachers would bring theoretical mathematics down to a more practical level. Obviously loving the very expensive fruit examples :)
Thank you so much @Cathiina, what an honor to be part of your learning journey, and I’m glad you like the expensive fruit examples! :)
Thank you Luis Serrano for this super explanatory video
Is there an industry standard for the KLD above which two distributions are considered significantly different (like how 0.05 is the standard for the p-value)?
Ohhh that’s a good question. I don’t think so, since normally you use it for minimization or comparison between them, but I’ll keep an eye, maybe it would make sense to have a standard for it.
Did anyone expect something different than Sofmax regarding the Bradley-Terry model as myself? 😅
lol, I was expecting something different too initially 🤣
Really love the way you broke down the DPO loss, this direct way is more welcome by my brain :). Just one question on the video, I am wondering how important it is to choose the initial transformer carefully. I suspect that if it is very bad at the task, then we will have to change the initial response a lot, but because the loss function prevents from changing too much in one iteration, we will need to perform a lot tiny changes toward the good answer, making the training extremely long. Am I right ?
Thank you, great question! This method is used for fine-tuning, not specifically for training. In other words, it's crucial that we start with a fully trained model. For training, you'd use normal backpropagation on the transformer, and lots of data. Once the LLM is trained and very trusted, then you use DPO (or RLHF) to fine-tune it (meaning, post train it to get from good to great). So we should assume that the model is as trained as it can, and that's why we trust the LLM and we try to only change it marginally. If we were to do this method to train a model that's not fully trained... I'm not 100% if it would work. It may or may not, but we'd still have to punish the KL divergence much less. And also, human feedback gives a lot less data than scraping the whole internet, so I would still not use this as a training method, more as refining. Let me know if you have more questions!
@@SerranoAcademy Thanks for the answer, I understand better. I forgot that this design is for fine-tuning.
Very nice lecture on attention.
Now whenever I watch Serrano's video, I first like it and the start watching it coz I know the video will gonna be outstanding as always.
Liked this video and subscribed your channel today.
Amazing video... Thanks sir for this pictorial representation and explaining this complex topic with such an easy way.
Thanks for the simplified explanation. Awesome as always. The book link in the description is not working.
Thank you so much! And thanks for letting me know, I’ll fix it
This was very helpful
The best explanation & depiction of SVD.
Thanks for the excellent explanation! I used to know the KL Divergence, but now I understand it!
Great one, the simpler it looks and harder to build!
This and your whole series of attention NN is a thing of beauty! There are many ways of simplifying this here, but you come the closest to understanding Attention NN and QC are identical and QC is much better. In my opinion QC has never been done correctly, the gates are too confusing and poorly understood. QC is not still in simplified infant stage, it is mature what QC can do and matches all Psychology observations. All problems in Biology and NLP are sequences of strings.
awesome!
Thanks!
Thank you so much for your kind contribution @bifidoc!!! 💜🙏🏼
I would like to say thank you for the wonderful video. I want to learn reinforcement learning for my future study in the field of robotics. I have seen that you only have 4 videos about RL. I am hungry for more of your videos. I found that your videos are easier to understand because you explain well. Please add more RL videos. Thank you 🙏
Thank you for the suggestion! Definitely! Any ideas on what topics in RL to cover?
@@SerranoAcademy more videos in the field of Robotics please. Thank you. You may also guide me how I can approach the study of reinforcement learning.
I would like to say thank you for the wonderful video. I want to learn reinforcement learning for my future study in the field of robotics. I have seen that you only have 4 videos about RL. I am hungry for more of your videos. I found that your videos are easier to understand because you explain well. Please add more RL videos. Thank you 🙏
So well explained
DPO main equation should be PPO main equation.
Exvelente explciacion!!!
Most intuitive explanation for QKV, as someone with only an elementary understanding of linear algebra.
It's kinda hard to remember all of these formulas and it's demotivating me from further learning.
You do not have to remember that formulas. You only have to understand the logic in them.
I'm a little confused about one thing: the reward function, even in the Bradley-Terry model, is based on the human-given scores for individual context-prediction pairs, right? And πθ is the probability from the current iteration of the network, and πRef is the probability from the original, untuned network? So then after that "mathematical manipulation", how does the human-given set of scores become represented by the network's predictions all of a sudden?
Thank you for the wonderful video. Please add more practical example videos for the application of reinforcement learning.
Thank you! Definitely! Here's a playlist of applications of RL to training large language models. czcams.com/play/PLs8w1Cdi-zvYviYYw_V3qe6SINReGF5M-.html
noone of the videos I seen on this subject actually explain where the hell qkv values come from! its amazing people jump on making video while not understanding the concepts clearly! I guess youtube must pay a lot of money! But this video does a good job of explaining most of the things, it never does tell us where the actual qkv values come from, how do the embendings turn into them, and actually got things wrong in my oppinion. the q comes from embeddings that are multiplied by the wq, which is a weight and parameter in the model, but then the question is, where does wq wk wv come from???
how do you choose the number of features in the 2 matrices, i.e. how did you choose to have 2 features only?
Hey I know this 👦. He is my Maths teacher who don't only teach but make us visualize why we learn the topic and how will it useful in real world ❤
It‘s was just so clear. 😃
I love his teaching, he makes complex things seem simple.
Love your videos, please make more such videos on mathematical description of generative models such as GAN, Diffusion, etc.
Thank you! I got some on GANs and Diffusion models, check them out! GANs: czcams.com/video/8L11aMN5KY8/video.html Stable diffusion: czcams.com/video/JmATtG0yA5E/video.html
We expect to describe wasserstein distance 😊
Ah good idea! I'll add it to the list, as well as earth-mover's distance. :)
@SerranoAcademy I also highly recommend to describe Explainable AI (XAI) which depends on statistics.
thank u
Thank you Luis. I'm sure I'll use this very soon.
That was intuitive as butter
Great video. One question I have, why would I use KL instead of CE? are there situations in which one would be more suitable than the other ?
That is a great question! KL(P,Q) is really the CE(P,Q), except you subtract the entropy H(P). The reason for this is that if you compare a distribution with itself, you want to get a zero. With CE, you don't get zero, so the CE of a distribution with itself could potentially be very high.
why did you take average at 6:30 ?
Great question! I took the average because the product is p_i^(nq^i), so the log is nq_i log(p_i), and I want to get rid of that n. It’s not super needed for the math, but I did it so that it gives exactly the KL divergence instead of n times it.
@@SerranoAcademy thanks for the clarification