How do we use Lagrange Multipliers in Data Science? --- Like, Subscribe, and Hit that Bell to get all the latest videos from ritvikmath ~ --- Check out my Medium: / ritvikmathematics
Now that you have shown us the theoretical parts I request that it would be better to now start building the practical code in Python for each individual video. That would help cause theory to meet with code.
Quick question: at 1:29 the matrix S you give is not symmetric, but when you do the Lagrange multipliers at 7:14 you use the result that the matrix derivative of u^TSu is 2Su which (in your other video "Derivative of a Matrix") you derived for the 2x2 case assuming the matrix was symmetric. Does this result hold for the non-symmetric case? If so, how would I go about showing that? If not, what can be done for the non-symmetric case?
When the matrix S is not symmetric, the gradient of the function f(u) = u^T S u is f'(u) = (S + S^T)u. From the Lagrangian stationarity condition, this implies that 2\lambda should be an eigenvalue of the matrix S+S^T. Therefore, the solution u* of the optimization is the eigenvector of S+S^T corresponding to the largest eigenvalue of S+S^T.
How do you choose the "order" of the constraint in your new function? I mean how do you choose to write: uTSu + \lambda(1-uTu) and not uTSu + \lambda(uTu-1) ?
I'd like to have a deeper view about this interpretation. The interpretation I understand intuitively is that of wikipedia, where the gradients of the goal function and of the constraint function have to be colinear. I like the eigenvalue-based interpretation, but something is unclear for me: in the present case, you chose the constraint function in such a manner that we get the simplification that enables to write things as an eigenvalue problem. It's harder for me to interpret it in a more general manner, with any form of goal/constraint functions.... Moreover, what about the fact that we also need to cancel d(Lagrangian function)/d(lambda)? I'm confused about how this could be related to eigenvalue....
From my point of view the eigenvalue/eigenvektor interpretation he uses her is just a SPECIAL CASE for the Problem u.TSu s.t. u.Tu = 1 he is discussing. While the Problem he discusses is certainly interesting the interpretation for a general case is misleading! There is no general connection between eigenvalues and lagrange multipliers (as far as i know). For multiple possible interpretations of lagrange multipliers see Convex Optimization / Stephen Boyd & Lieven Vandenberghe.
thanks for this nice crisp video. If it is minimisation problem - means looking for minimum eigen value and its vectors? Im looking forwad to see the video to detail the basis of lagrange multipliers. Thanks once again.
6:45 BIG mistake right here. This derivative ONLY holds when S is symetric, which is NOT true for this case. The general solution is (S+S^T)u, which is easy to show that works. Other than that, good video ;)
thank you. To me, there is still something fishy going on - you say that you'd want to take the biggest eigenvalue of matrix S and find its corresponding eigenvector - in order to "maximize the success". Those eigenvalues and -vectors are complex-valued (a pair of a complex number and its conjugate), and we can't just compare complex numbers. So the best direction would be spiraling outwards?
u.T S u is a formula I've encountered several times, can anyone help me what it is actually? It seems we're projecting the linear transformation of u onto itself, but what does that tell us?
What if the function that we're maximizing cannot be represented as a simple uT*S*u, only particular functions can. Or are we only interested in uT*S*u in data science anyways
Hey man these videos are great, you deserve more attention
ritvik doesn't just throw a bunch of equations on the board: he puts everything into perspective which is unique in mathy videos.
Great explanation man. Your explanation is way more intuitive than most of the videos out there.
thanks!!
This was fantastic. You have a gift for teaching.
thanks!
Just subscribed I like how you explain things and videos are very “snackable.”
thanks!
Now that you have shown us the theoretical parts I request that it would be better to now start building the practical code in Python for each individual video. That would help cause theory to meet with code.
This is great explaination. Thanks a ton.
It would be great if you could make a video on maths behind Logistic Regression as well.
Oh, I have a video on that :) Please check my channel
One word.....AMAZING......great job
Quick question: at 1:29 the matrix S you give is not symmetric, but when you do the Lagrange multipliers at 7:14 you use the result that the matrix derivative of u^TSu is 2Su which (in your other video "Derivative of a Matrix") you derived for the 2x2 case assuming the matrix was symmetric. Does this result hold for the non-symmetric case? If so, how would I go about showing that? If not, what can be done for the non-symmetric case?
When the matrix S is not symmetric, the gradient of the function f(u) = u^T S u is f'(u) = (S + S^T)u. From the Lagrangian stationarity condition, this implies that 2\lambda should be an eigenvalue of the matrix S+S^T. Therefore, the solution u* of the optimization is the eigenvector of S+S^T corresponding to the largest eigenvalue of S+S^T.
Yeah I was thinking that too
Thank you very much sir. Really appreciate you making these super helpful videos.
thanks!
Thanks a lot for the video. I hadn't realized the relationship between lagrangians and eigenvectors.
Great videos.. very well explained.. thank you
Glad you like them!
holy shit that was magical
very clear explanation. good job my friend
Glad it was helpful!
How do you choose the "order" of the constraint in your new function?
I mean how do you choose to write:
uTSu + \lambda(1-uTu) and not
uTSu + \lambda(uTu-1) ?
It doesn't matter. Both should lead to the same result.
Awesome video mate! It makes so much sense
I'd like to have a deeper view about this interpretation.
The interpretation I understand intuitively is that of wikipedia, where the gradients of the goal function and of the constraint function have to be colinear.
I like the eigenvalue-based interpretation, but something is unclear for me: in the present case, you chose the constraint function in such a manner that we get the simplification that enables to write things as an eigenvalue problem.
It's harder for me to interpret it in a more general manner, with any form of goal/constraint functions....
Moreover, what about the fact that we also need to cancel d(Lagrangian function)/d(lambda)? I'm confused about how this could be related to eigenvalue....
From my point of view the eigenvalue/eigenvektor interpretation he uses her is just a SPECIAL CASE for the Problem u.TSu s.t. u.Tu = 1 he is discussing.
While the Problem he discusses is certainly interesting the interpretation for a general case is misleading!
There is no general connection between eigenvalues and lagrange multipliers (as far as i know).
For multiple possible interpretations of lagrange multipliers see Convex Optimization / Stephen Boyd & Lieven Vandenberghe.
your intuition level is just A+. I think you should do ds and ml full time
thanks for this nice crisp video. If it is minimisation problem - means looking for minimum eigen value and its vectors? Im looking forwad to see the video to detail the basis of lagrange multipliers. Thanks once again.
6:45 BIG mistake right here. This derivative ONLY holds when S is symetric, which is NOT true for this case. The general solution is (S+S^T)u, which is easy to show that works. Other than that, good video ;)
Well explained!
thank you. To me, there is still something fishy going on - you say that you'd want to take the biggest eigenvalue of matrix S and find its corresponding eigenvector - in order to "maximize the success". Those eigenvalues and -vectors are complex-valued (a pair of a complex number and its conjugate), and we can't just compare complex numbers. So the best direction would be spiraling outwards?
u.T S u is a formula I've encountered several times, can anyone help me what it is actually? It seems we're projecting the linear transformation of u onto itself, but what does that tell us?
Thank you man
Amazing
6:00 didn't understand why lambda*(1-uTu) isn't 0, since we defined uTu = 1, didn't we?
Great :)
thanks!
What if the function that we're maximizing cannot be represented as a simple uT*S*u, only particular functions can. Or are we only interested in uT*S*u in data science anyways
long live and prosper