Why the gradient is the direction of steepest ascent

Sdílet
Vložit
  • čas přidán 10. 05. 2016
  • The way we compute the gradient seems unrelated to its interpretation as the direction of steepest ascent. Here you can see how the two relate.
    About Khan Academy: Khan Academy offers practice exercises, instructional videos, and a personalized learning dashboard that empower learners to study at their own pace in and outside of the classroom. We tackle math, science, computer programming, history, art history, economics, and more. Our math missions guide learners from kindergarten to calculus using state-of-the-art, adaptive technology that identifies strengths and learning gaps. We've also partnered with institutions like NASA, The Museum of Modern Art, The California Academy of Sciences, and MIT to offer specialized content.
    For free. For everyone. Forever. #YouCanLearnAnything
    Subscribe to KhanAcademy: czcams.com/users/subscription_...

Komentáře • 185

  • @stevemanus6740
    @stevemanus6740 Před 4 lety +212

    I was still struggling with the intuition of this and I think I have come up with another simple way to conceptualize the gradient and the characteristic of steepest ascent.
    Start by remembering that the gradient is composed of the partial derivatives of the function in question. If you think about each partial derivative as just a simple rise-over-run problem, then you can see clearly that each partial derivative is going to give you the amount of change to the output (rise) as the input (run) is increased. Let's consider Grant's 3-dimensional example, so we can say that inputs for the multivariable problem are x and y and the output is z. Because the slopes vary based on location on the x-y grid, we need to pick a starting point on the grid. Let's just say (1,1). It doesn't matter. Now let's look at the x-z 2-dimensional problem first.
    Let's say the partial derivative tells us that at point (1,1) for each 1 unit increase of x, z is increased by 4 units (i.e. the derivative = 4x). Since x can move in only 1 dimension, the only choice of direction we have is whether the change in x is positive or negative. Obviously, if we move x by -1, then z will decrease by 4 units. So, if we need to choose in which direction we move x to increase z, we know that it is in the positive direction. If we decrease x, z will decrease as well.
    Now, do the same thing for y and z and let's say that the partial derivative for y at (1,1) is 3y. This means that a 1 unit increase in y will result in a 3 unit increase in z. Again, if you want to increase z by moving y, increase y, don't decrease it.
    Now, let's put the two variables together. We now have a choice of directions. It's no longer sufficient to say that we need to increase x and we need to increase y. (Though that is half the battle). We need to also decide the relative value of increasing x versus increasing y. When we choose a direction we are making a tradeoff between the relative movements in each of the basis (x, y) directions.
    Let's say that we are allowed to move in any direction by 5 units (if you haven't noticed yet, I like to keep my Pythagorean theorem problems simple!). The question is - "in what direction can I move 5 units to maximize the increase in z?" This would correspond to the direction of steepest ascent. So, let's say we use all of our 5 units in the x direction. This corresponds to the vector [5,0]. Since the increase in z is 4x + 3y, the total increase in z will be 20 (5∙4) . If, on the other hand, we use all of our 5 units in the y direction [0, 5], the total increase in z will be 15 (5∙3).
    But the beauty of using vector geometry is that we can use our 5 units in a direction that will give us effectively a movement in the x direction of 4 and in the y direction of 3. We get 7 units of movement for the price of 5! Of course, that's the hypotenuse of our 4 by 3 right triangle. So, by following the vector [4, 3], z is increased by 4∙4 from the x direction movement and by 3∙3 from the y direction movement for a total increase of 25! I think you can see that any other direction will produce a smaller increase.
    And, of course, the vector [4, 3] is exactly our gradient!
    It's interesting to also think about when one of the components of the gradient is negative. Let's imagine that instead of 3y, our partial y derivative is -3y. This means that a positive movement in y produces a decrease in z. Remembering that we can only move y in 1 dimension, if we move y downward then of course z will increase. So you can see that the gradient vector [4, -3] will produce the same 25 unit increase in z for a 5 unit move IF we move y in the negative (downward) direction. Just follow the vector!
    Of course this is calculus, so these 3, 4, and 5 unit moves are very, very small. 😊

    • @rafaemuhammadabdullah6904
      @rafaemuhammadabdullah6904 Před 3 lety

      why 4*4? 5*4+ 5*3 = 35, So why not 5? (Since 5 is limit)

    • @electric_sand
      @electric_sand Před 3 lety +5

      Your explanation is awesome Steve

    • @ahmadizzuddin
      @ahmadizzuddin Před 3 lety +4

      @@rafaemuhammadabdullah6904 I think what Steve meant is five as in the size of a vector v=[a, b]. So [4, 3].[a, b] where ||v||=sqrt(a^2 + b^2) = 5. In this case 4a +3b

    • @ujjwal2912
      @ujjwal2912 Před 3 lety +3

      BAM !!!

    • @rajinish0
      @rajinish0 Před 3 lety +7

      you could make it easy by starting with 2 dimensional calculus: A positive slope would tell you to move in the right direction to ascend whereas a negative slope would tell you to move in the left direction to ascend. Now in 3 dimensions, partial of x gives you a value for the steepest ascent in x direction, same for y. So you can say generally the steepest ascent = partial(x)i + partial(y)j where i and j are the unit vectors in x and y directions.

  • @alvapeng1474
    @alvapeng1474 Před 5 lety +70

    the gradient is not just a vector, it's a vector that loves to be dotted with other things. /best

    • @coopercarr9407
      @coopercarr9407 Před 3 lety +5

      bruh that sentence had me laughing, what a thought lol

    • @instinct736
      @instinct736 Před 3 lety

      @@coopercarr9407 😀

    • @Olivia-by2vm
      @Olivia-by2vm Před 2 lety

      just like EVERY vector!!!!

    • @dalisabe62
      @dalisabe62 Před 2 lety

      @@coopercarr9407 ya that was pretty intriguing expression.

  • @leonhardolaye-felix8811
    @leonhardolaye-felix8811 Před rokem +5

    For anyone confused, This is how I see it, if it helps at all:
    If you’re at a point (a, b) then the directional derivative at that point looking in the direction v (where ||v||=1), is given by ∇f(a, b) • v.
    When we say,”What is the direction of steepest ascent?”, what we are really asking is,”In what direction do I move in to produce the largest directional derivative?” In other words, we want to maximise ∇f(a, b) • v.
    Given that we are at a single point, we can then say that ∇f(a, b) is a constant since it is evaluated by only using the particular values a and b. v is the only variable here and so the only way to maximise ∇f(a, b) • v is to alter the vector v. We know that the dot product of 2 vectors is maximised when they are pointing in the same direction as each other (see proof of this at bottom). Using this and the fact that we are not varying ∇f(a, b), we can conclude that to maximise the directional derivative (given by ∇f(a, b) • v) we must vary the vector v so that it points in the same direction as ∇f(a, b). And that’s it - we’ve shown how when the vector v, is in the same direction as the gradient function ∇f(a, b), it’s output (the directional derivative) is maximised. Saying that v is in the same direction as ∇f(a, b) is to say that v = (1/k) × ∇f(a, b) where k is the magnitude of ∇f(a, b). This is because v is a unit vector as previously stated. These 2 vectors (v and ∇f(a, b)) are pointing in the same direction so the conclusion can be drawn that ∇f(a, b) also points in the direction of steepest ascent
    Proof to maximise dot product:
    Consider two vectors a and b. The angle between a and b is given by cosθ = (a • b) / (|a| × |b|)
    We can rearrange to say that a • b = |a||b|cosθ. To maximise the left hand side, which is what we want to prove, we must maximise cosθ. This achieves its maximum value at 1, which occurs when θ = 0. When θ = 0, we can visualise this by saying the vectors are parallel and overlapping each other. So we can conclude that a•b is maximised when a is parallel to, and overlapping b, Vice Versa.

  • @priyankkharat7407
    @priyankkharat7407 Před 5 lety +72

    Thank you so much Grant!
    Simplicity is the most difficult thing to acheive.

  • @jacobvandijk6525
    @jacobvandijk6525 Před 5 lety +37

    You can't climb a mountain f(x,y) in the fastest way possible by moving just in the x- or the y-direction (using partial derivatives). You most of the time have to go in a direction that's a combination of the x- and the y-direction (using directional derivatives)!

  • @ahmadizzuddin
    @ahmadizzuddin Před 3 lety +6

    My takeaway from this is to try reduce it to one dimension to understand what each element is doing to increase the "steepness" of a gradient.
    Say *f(x)=-x^3* then *df/dy=-3x^2*.
    Since the constant of this example derivative is negative, that means *x* needs to move in the negative direction of the number line to increase the output of *f(x)*.
    After you know what direction of the number line to go towards for each element, the magnitude you move for each element is proportional to the size of the element compared to the whole gradient vector.
    Anyways thanks, great explanation Grant :)

  • @pratibhas2468
    @pratibhas2468 Před 11 měsíci +1

    Lucky that I found these intuitive explanations.. it truly feels great when you understand what's actually going on when we use a formula

  • @TranquilKr
    @TranquilKr Před 8 lety +8

    Beautiful!
    Didn't think of it that way. Thanks a lot!

  • @allyourcode
    @allyourcode Před 2 lety +12

    Thanks! Here is how I would very concisely explain it: The problem of finding the direction of steepest ascent is exactly the problem of maximizing the directional derivative. The directional derivative is a dot product. When you are trying to maximize a dot product, choose the direction to make it parallel to the other vector, Since in this case, the other vector is given to be gradient(f)(v), THAT IS the direction of steepest ascent.
    For me, the basic intuition comes from the dot product. The part that is not so obvious to me is that gradient(f) dot v is the "right" formula for the (definition of) the directional derivative.

  • @arijitdas4504
    @arijitdas4504 Před 3 lety +1

    Learning this concept was no less than a sense of accomplishment itself! Grant is Grand! Cheers!

  • @brandonquintanilla411
    @brandonquintanilla411 Před 6 lety +14

    Since I´ve heard the voice of 3B1B I knowed that this is going to be a great video

  • @NzeDede
    @NzeDede Před 2 lety

    It's like my mind just got illuminated!!
    I've always underestimated the power of the Del operator.
    Not only is this operator showing you the slope of a scalar field in the direction of a vector, it also points at the direction of the unit vector with the max gradient, and also tells you the size of the slope of this unit vector with a max value.
    It's crazy how this new revelation changes your understanding of vector calculus.
    Thanks a lot 🙏🏽🙏🏽🙏🏽🙏🏽🙏🏽

  • @charusingh2159
    @charusingh2159 Před 3 lety +3

    I always wonder how Grant have developed such a great understanding of maths, he does magic with maths. !!!

  • @wontpower
    @wontpower Před 6 lety +1

    This helps so much, thank you!

  • @blyatmanmarkeson708
    @blyatmanmarkeson708 Před 6 lety +1

    This is so easy to follow! I love it.

  • @liabraga4641
    @liabraga4641 Před 6 lety +6

    Beautiful and elucidating

  • @jatinsaini7790
    @jatinsaini7790 Před 3 lety +1

    The best explanation of a gradient on Internet!

  • @AbDmitry
    @AbDmitry Před 4 lety

    Thanks a lot Grant! It was a great pleasure to see you here. I am a big fan of 3B1B.

  • @gustavomello2207
    @gustavomello2207 Před 7 lety +75

    Amazing video. Matchs perfectly with your Linear Algebra series.

  • @daniloespinozapino4865

    that last explanation kinda blew my mind a bit, nice!

  • @mireazma
    @mireazma Před 7 lety +26

    I'd like to add my two cents on this, as I couldn't relate some things at the beginning but after some reflection, I figured them out:
    1. The gradient is the direction of the steepest ascent because the gradient encompasses all of the possible d (change) for the function. This is how:
    - The ubiquitous 1 dimension input derivative - the ordinary regular derivative - means the change of the function (in its entirety that is, the maximum possible change);
    - The gradient "owns" the derivatives on all possible "angles". It suffices to have snapshots of the d only from all orthogonal directions (2 in our case).
    As a note, the dot product is known for measuring how much of a vector, another vector is (roughly speaking). So dotting the vector of the directional derivative with the gradient is merely how much of the gradient - the entire change - the said vector is. And of course, to get a maximum you want to dot two parallel vectors.
    2. Question: is the gradient with partials of x and y, the only possible vector to have the steepest ascent direction? Well I thought why not make one by taking any 2 orthogonal vectors (on xy plane) and get the directional derivatives of these. The two resulted derivatives can be the components of another gradient. I feel I'm missing something here but I'll get to its bottom.

    • @airraidsiren
      @airraidsiren Před 7 lety +6

      If you do as you propose, and take any 2 orthogonal vectors int he XY plane, it is just a change of basis. If they are unit vectors, you're just rotating your reference frame around the Z axis. Your new vector of two directional derivative coefficients will be the same vector as before, just represented in your new basis. You don't need orthogonal vectors either, they can be linearly dependent, they just can't be colinear as they need to span the XY plane.

    • @Raikaska
      @Raikaska Před 7 lety +4

      Wow, i still don't get it but I think you people's comments should be included in the video. I often think about taking derivatives in two ortogonal directions, but thing is, the function's output is determined by "x" and "y", that is, already the two directions in which the gradient loos at...

    • @dereksmetzer2039
      @dereksmetzer2039 Před 6 lety +2

      Adam Smith little late to the party, but i think you mean you'd need 2 linearly independent vectors. any two vectors which span a plane would suffice and would be necessarily linearly independent. choosing an orthogonal basis just makes the computations prettier.

    • @dereksmetzer2039
      @dereksmetzer2039 Před 6 lety +3

      additionally, if you visualize the gradient as vectors on a contour map of the function, perpendicular vectors to the contour lines are oriented in the direction of greatest increase - i.e., small 'nudges' along these directions result in the largest changes in the function. vectors which are orthogonal to the gradient vectors point along contour lines, thus a change in this direction keeps the function at a constant value and therefore the gradient along these lines is zero.

  • @b_rz
    @b_rz Před 2 lety +2

    Thanks man . That was the best explaination ever . Simple and sweet . I was very confused but you saved me :)
    Thanks

  • @desavera
    @desavera Před 2 lety

    Excellent exposition ... thanks a lot !

  • @senri-
    @senri- Před 8 lety +1

    Great video helps a lot thanks :)

  • @muratcan__22
    @muratcan__22 Před 5 lety

    most critical video in understanding the gradient's relation to the steepest ascent.

  • @chrismarklowitz1001
    @chrismarklowitz1001 Před 5 lety +5

    Think about the gradient in one dimension. It is the biggest roc since its the only roc in one direction. Think about the gradient in two dimensions. It combines the greatest roc if you could only go in the y and the greatest roc if you could only go in the x. To create a greatest roc overall.

    • @AA-tm3ew
      @AA-tm3ew Před 5 lety

      great way to think about it

  • @poiuwnwang7109
    @poiuwnwang7109 Před 4 lety

    Derivation of del in direcion of del that is equal to magnitude of del gives a lot of intuition. Nice!

  • @foerfoer
    @foerfoer Před 5 lety +1

    Honestly, thank you

  • @niroshas1790
    @niroshas1790 Před 5 lety

    i liked it. I would request put some lecturers on riemann stieltjes integral and the difference between it and Riemann integral

  • @meghan______669
    @meghan______669 Před 5 měsíci

    I’m still processing everything (I’m not going to ace an exam any time soon) but I’m excited that I’ve been able to follow this logic. Thank you!

  • @dalisabe62
    @dalisabe62 Před 2 lety +1

    @steve manus, I
    like the way you broke this concept down almost like a problem of Lagrange multiplier, where we are asked to find the optimal value of some function f(x,y) subject to the constraint of another function g(x,y) in two dimensions. Of course as you may know already or expect, the concept of the gradient is incorporated into the solution. It is typically the scenario that involves the balance between the independent variables so as to produce the maximum output for the function of such variables. Usually, the optimal value lies in between the extreme choices for the variables. Extreme X or extreme Y choices, as you noted, didn’t produce the maximum output for f(x,y). I was hoping that the video maker would stay away from the concept of directional derivative to explain the geometrical meaning of the gradient. In fact, I liked the mapping to a straight line explanation that he started with at the start of the video. I wished that he finished that up.

  • @grinfacelaxu
    @grinfacelaxu Před 25 dny

    ThankYou!

  • @Niharika-uz6xl
    @Niharika-uz6xl Před 5 lety

    SIMPLY AWESOME.

  • @Postermaestro
    @Postermaestro Před 6 lety

    so good!

  • @trivialstuff2384
    @trivialstuff2384 Před 5 lety

    Thank you

  • @kaustubhpandey1395
    @kaustubhpandey1395 Před 9 měsíci

    When Geant first told us about the gradient giving the steepest ascent, I instantly imagined a graph where you have +ve partial derivatives in x and y directions, but a -ve one in between them (i.e. vector 1,1 etc). This would make the gradient vector not be the steepest ascent, rather the pure x or y direction (whichever is maximum slope).
    But after this I realised there must be a concept of multivariable differentiablity because in this case there would be a sharp point at that location!

  • @tinkuefu09
    @tinkuefu09 Před 5 měsíci

    Thanks grant ❤

  • @scholar-mj3om
    @scholar-mj3om Před 4 měsíci

    Marvellous💯

  • @danielyoo828
    @danielyoo828 Před 5 měsíci

    Slope =/= Gradient
    There can be only one gradient (vector) that's mapped by a given point (a,b). So, the gradient is the same regardless of the applied vector at a given point.
    However, there can be multiple slopes (scalar) at a given point. The slope depends on the applied vector. We can slice the graph with a plane in the same direction of the applied vector, and we can do this in infinite ways, all resulting in a different slope value.
    Think of it as climbing a hill sideways (arbitrary applied vector) instead of directly up (following the gradient).

  • @danieljaszczyszczykoeczews2616

    thank you very much for a video!!! :D
    cheers from Ukraine

  • @mohdzikrya5396
    @mohdzikrya5396 Před rokem

    Thanks

  • @farhanhyder7304
    @farhanhyder7304 Před 2 lety

    Thank you. It's been bothering me for a long time

  • @BedrockBlocker
    @BedrockBlocker Před 4 lety +2

    You just explained the Cauchy-Schwarz equation in the end, didn't you?

  • @himouryassine
    @himouryassine Před 8 měsíci

    Hello, can you please tell what is the programming material do you use to illustrate the functions?

  • @kurrennischal235
    @kurrennischal235 Před rokem

    For me the easiest way is to think of a basic function
    f : R -> R
    The derivative of f at a point a tells you which direction to walk (left or right on the y axis) for the steepest ascent. This is the same thing for 2 dimensions

  • @anonymoustraveller2254

    Beauty man ! Beauty.

  • @avadhoothede8392
    @avadhoothede8392 Před 3 lety

    Great

  • @xoppa09
    @xoppa09 Před 6 lety +1

    Great video. My only quibble is with the notation for the directional derivative. You have ∇_v f. I have seen the directional derivative written as D_v f, and ∂f/∂v, which seem to make sense.
    But the use of ∇_v f seems non standard and a bit confusing. How do we interpret ∇_v f? The "gradient in the direction of unit vector v" does not make sense, since the gradient is independent of v and is fixed for all intents and purposes.

  • @ImaybeaPlatypus
    @ImaybeaPlatypus Před 7 lety

    Why isn't this linked to the video on the website?

  • @matheosxenakis8978
    @matheosxenakis8978 Před 5 lety +40

    So if I'm understanding this correctly, the argument he makes after he draws in the gradient line is that the vector dotted with the gradient that gives the max value for the gradient is the vector that is parallel to the gradient itself. But doesn't this argument only work if we already take it as true that the gradient *is* actually already in the direction of max increase, so that a vector parallel to it is also in the direction of max increase? I still don't get why the gradient definition inherently points in the direction of max increase??

    • @JaSamZaljubljen
      @JaSamZaljubljen Před 5 lety +11

      I'm on your side buddy

    • @BigNWide
      @BigNWide Před 5 lety +9

      The reasoning does feel circular.

    • @abdullahyasin9221
      @abdullahyasin9221 Před 4 lety +11

      No, it's not circular. He does not assume in this argument that the gradient is in the direction of steepest ascent. The starting points or the premises of this argument is the definition of directional derivative and the definition of the dot product. The directional derivative is just the rate of change of the function in the direction considered. The is no concept of a maximum rate of change in the concept of directional derivative, unlike the gradient. In the concept of dot product, the dot product of two vectors is a maximum when they are parallel. Combine these two concepts and you can see a beautiful proof emerge! 🙂

    • @adityaprasad465
      @adityaprasad465 Před 4 lety +7

      It helps to take a few steps back. Suppose I know that, for each unit I were to walk in the x direction, my function would increase by some amount x' (and for each unit of y, y'). Now suppose I *actually* walk *a* units in the x-dir and *b* units in the y-dir. How much does f increase? It increases by the weighted sum x'*a + y'*b. How do we find the (a, b) that maximizes this weighted sum (where (a, b) must be a unit vector -- no fair walking further in some direction than others)? One way is to notice that it's the dot product of vectors v=(x', y') and w=(a, b). We know that v dot w = |v||w|cos theta, and since v is fixed and |w|=1, this is maximized for theta=0 (so cos theta = 1).

    • @BigNWide
      @BigNWide Před 4 lety +5

      @@adityaprasad465 Yes, when two vectors point in the same direction, their dot product is maximized, but that's not the issue of concern. The issue is that this argument is being used to justify the gradient being the maximum of all possible vectors, which is an invalid argument.

  • @joaquincastillo4824
    @joaquincastillo4824 Před 4 lety +1

    I'm not nearly as advanced as you guys but I'm a little bit unsure about the logic here. If we let nabla_f= [a,b], then, there exists another vector "-nabla_f"= -[a,b] (direction of fastest descent) such that dot(-nabla_f,-nabla_f) = dot(nabla_f,nabla_f) = max(nabla_f,V), even though "-nabla_f" points in the exact opposite direction.
    Would it be possible that the condition "dot(nabla_f,nabla_f) = max(nabla_f,V)" is a necessary but NOT sufficient condition to prove that nabla_f is the direction of fastest descent?

  • @marat61
    @marat61 Před 6 lety

    How to extend this conclusion to complex space?

  • @ashita1130
    @ashita1130 Před 4 lety

    Wish you were my Prof.!!!!!

  • @chainesanbuenaventura2874

    Best video!

  • @CREEPYassassin1
    @CREEPYassassin1 Před 3 lety

    I'm 5 videos in and my brain is on fire

  • @vasundarakrishnan4093
    @vasundarakrishnan4093 Před 3 lety

    To those who are confused, the direction of the steepest descent is the direction in which the directional derivative is maximum.
    Directional derivative for any vector v = Gradient * vector v
    So we maximise (gradient * vector v) to maximize directional derivative

  • @yashawasthi242
    @yashawasthi242 Před 5 lety

    My question is if the function is differentiable shouldn't the change in function from all the direction should be same like when we do for complex analysis.

  • @LolForFun422
    @LolForFun422 Před 5 lety

    Thank you!

  • @Ayah_95
    @Ayah_95 Před 2 lety +1

    I never faced such a difficulty to understand something in maths like this 😂

  • @nijatshukurov9022
    @nijatshukurov9022 Před 4 lety

    Thank you 3blue1brown

  • @andrei-un3yr
    @andrei-un3yr Před 4 lety +4

    could you provide us a video explaining why a dot b = a x b * cos(a,b)? I understand it for geometric vectors, but it's unclear for me how this can be scaled to n-dimensional vectors.

    • @sriyansh1729
      @sriyansh1729 Před 2 lety

      I think he made a video on his channel 3 blue 1 brown explaining this

  • @frankzhang105
    @frankzhang105 Před 4 lety

    Thanks much, but i am still not clear why gradient direction can let the function f have the steepest change. How gradient relates to the steepest output change of function f? Thanks very much.

  • @danaworks
    @danaworks Před 2 měsíci

    Correct me if I'm wrong, but the "proof" here seems to be a circular argument.
    Consider this:
    1) The directional derivative could also be = (another vector that is NOT the gradient) dot (direction vector), isn't it?
    2) Then with the argument presented here, wouldn't MAX(direction derivative) = (another vector) * (direction vector)?
    So the question remains: how do we know that projecting along the "gradient vector" gives a larger value than projecting along "another non-gradient vector"?

    • @ryderb.845
      @ryderb.845 Před měsícem

      I disagree with your first point. The directional derivative does have to be multiplied by the gradient because those are the actual slopes at that point. The direction vector just says we want to go more in the y or x or whatever direction, but it has to stay along that slope

  • @46pi26
    @46pi26 Před 6 lety +33

    Terribly sorry to Sal, but I'm just too fond of Grant's voice to watch any of Sal's videos.

    • @syedrizvi597
      @syedrizvi597 Před 5 lety +12

      Then you're missing out

    • @CrankinIt43
      @CrankinIt43 Před 3 lety +1

      Sal has a pretty god-like voice too though

  • @mathalysisworld
    @mathalysisworld Před 25 dny

    Wow

  • @cooper7655
    @cooper7655 Před 5 lety +4

    TLDW: The directional derivative represents the rate of change of the function in that direction. If you try all possible directions centered at that point, it happens that the magnitude of the directional derivative is largest when taken in the direction of the gradient. Therefore, we can conclude that the gradient points in the direction of steepest ascent.

    • @DougMamilor
      @DougMamilor Před 4 lety +2

      This is comment is by far clearer than the entire video. Thank you.

    • @Julie-ts9gi
      @Julie-ts9gi Před 2 lety +2

      so, it seems purely coincidental to me. Is there any sort of explanation why? The video didn't really explain it.

  • @Festus2022
    @Festus2022 Před měsícem

    Why is the magnitude of the gradient vector said to be the RATE of maximum ascent? When I see "rate", I think slope. Why isn't the rate of ascent simply the partial of y divided by the partial of x.? Isn't this the slope of the gradient....i.e. change in y over the change in x? What am I missing? thanks

  • @robertwilsoniii2048
    @robertwilsoniii2048 Před rokem

    The way I've always seen it is every sum of derivatives will be a combo of the partials. Therefore, the purest and least inefficient path is the least scaled up linear combination of the bases so path with least or minimized resistance and drag is the combo of just the two partial derivatives or the gradient vector.
    I'm pretty sure you could prove this with the triangle inequality. Any sum of multiples of the bases vectors will have a longer hypotenuse than the sum of just the partial derivatives holding one side, like the height, constant on both. In other words, you'll waste energy traveling farther than necessary horizontally for the same movement vertically compared to the path of the pure partial derivatives. But you can't move faster than those, because you're limited by the physical shape of the surface you're on. You have no other choice, the constraints knock down other paths physically or hypothetically in the case of imagined scenarios.

  • @wajidali-oi1wo
    @wajidali-oi1wo Před 11 měsíci

    Kindly tell me that ....
    As we know that...
    Gradient is (n-1)d as compared to scalar function of (n)d.
    ..
    Keeping in mind this thing.....
    Does gradient at a point means A vector of global maximum locator in dimension less than one to scalar function.....??????
    For example....
    If phi=3d func.
    Then
    Obviously
    ...
    Del phi= 2d vector at a point perp. To level surface...
    Then
    Does del means a vector in 2d that locates maximum value of phi???????

  • @trendypie5375
    @trendypie5375 Před 4 lety +1

    i am wondering if a vector V is doted with any vector A other than gradient vector , it still give the max value if V is parallel to A . so still the video doesn't prove that gradient is the steepest ascent ,,correct me if i am wrong

  • @winstonvpeloso
    @winstonvpeloso Před 3 lety +2

    I think it's hilarious how when Grant does videos for KA he repeats out loud what he's writing on the screen like Sal does. Makes me laugh every time

  • @ivanluthfi8832
    @ivanluthfi8832 Před rokem

    i think for steepest descent you need to put "-" , which come from cos theta , where theta = pi, gives the minimum for the objective function. cmiiw

  • @jameslow5738
    @jameslow5738 Před 7 lety +1

    Can anyone explain to me at 6:55, if the projected vector could also have a value of larger than 1? I mean it depends on the direction of projection too right?

    • @euromicelli5970
      @euromicelli5970 Před 7 lety +9

      Y Low, no, the vector is length one already. Projecting it can only make it shorter. You can also see it from the more algebraic definition of dot product, "(U dot V) = ||U|| * ||V|| * cos(theta)" where theta is the angle between the vectors. The only thing that changes is the angle, and the longest dot product happens when cos(theta) is the largest, that is 1 (meaning, the vectors are parallel)

    • @523101997
      @523101997 Před 7 lety +1

      tilt your head 90 degrees to the right. You'll see its a right angle triangle with the directional vector being the hypotenuse. Therefore the other 2 sides must be smaller than one

  • @andrei-un3yr
    @andrei-un3yr Před 4 lety

    I don't understand why the directional derivative gives the slope. Firstly, if I have the slope of a function = df/dx, then I can only multiply it with dx if I expect the resulting change df to match the function graph. Otherwise it matches only the slope line. For directional gradients, you mentioned the vector length should be 1 instead of infinitely small. That means that a gradient component df/dx multiplied with the corresponding directional vector x-component will result in a change that will align with the slope line, but not with the graph. Can somebody cast light into this issue?

  • @NoName-tj8dm
    @NoName-tj8dm Před 2 lety

    Why length is less than 1 at 5:50 ?

  • @abcdef2069
    @abcdef2069 Před 7 lety +1

    let x^2 + y^2 + z^2 = 1, so that z = f(x,y)
    z = ( 1 - x ^2 + y ^2 ) ^ (1/2) z < 0
    = - ( 1 - x ^2 + y ^2 ) ^ (1/2) z > 0
    for z=0, use anything to make it continuous
    1. prove the max value of gradient at (x,y,z) = (0,0,1) when the initial point is from (x,y,z) = (0,0,-1)
    2. find the gradient at (x,y,z) = (1, 0, 0) from the problem1 , when the gradient becomes infinity
    if you will, change my questions to make it happen gradiently
    possible to do gradient on a closed surface?

  • @abcdef2069
    @abcdef2069 Před 7 lety

    at 2:10 i thought the same, the combination of derivatives gives you the steepest ascent and not the steepest descent.
    f(x,y)= x (x-1) = x^2 -x , this function has the min at x= 0.5 and max at infinity
    del f = ( 2x - 1 ) i + 0 j
    when x= -1 del f = -3 i , this is correct. -3 direction will lead you the max
    when x= 1 del f = 1 i, this is correct. +1 direction will lead you the max
    when x =0.5 del f=0 , does this fail?, because it gives no directions, it doesnt know if this is a max or min.

    • @Raikaska
      @Raikaska Před 7 lety

      SO THAT's WHY
      A VECTOR CAN ONLY POINT IN ONE DIRECTION
      WOW, THANK YOU!! don't know how i didnt realize so before

  • @Festus2022
    @Festus2022 Před 3 lety

    I don't think the narrator ever really explained why the Gradient vector is ALWAYS in the direction of the steepest slope. As far as I could tell, he only explained how the directional unit vector interacts with the Gradient to reduce it or maintain it at its maximum.
    If every point on a 3D-surface has an infinite number of tangent lines, all with potentially different slopes, how can taking partial derivatives from just 2 directions (x and y) and combining them into a vector always point in the direction of maximum steepness?

    • @98danielray
      @98danielray Před 2 lety

      he did. the inner product is largest when parallel to the vector.
      the partial derivatives are just in the direction of the basis vectors
      the basis vectors generate all vectors in your vector space by linear combinations, hence thats all the information you need. that is why the directional derivative is brought up in the first place, because linear combinations of partial derivatives correspond to derivatives in the directions of the vectors that are precisely those linear combinations of basis vectors (since the derivative is linear). ex : say you want the change in the direction (1,2) which is 1(1,0) + 2(0,1) written in the canonical basis ; thatd correspond to 1df/dx + 2 df/dy, or grad f at the point dotted with (1,2).

  • @bakeqamza8907
    @bakeqamza8907 Před 5 lety +1

    it is rather consequence than reason

  • @moseslocke2084
    @moseslocke2084 Před měsícem

    It seems like we are saying that the gradient is not always the direction of steepest ascent.?
    What if f=-(x^2)-(y^2)?

  • @jeffgalef121
    @jeffgalef121 Před 7 lety

    I'm having trouble reconciling the two views of f(x,y). On one hand, you show them as mapping a 2D space to a 1D number line. On the other hand, you show them as mapping a 2D space to a 3D space, as when you show a 3D graph. But, it is not really 3D, is it? The height, f, is just an interpretation of the dependent variable, correct? You could show a 2D graph with color instead of height, right? To me, that makes the gradient easier to understand why it's on a plane below a 3D shape. Thanks.

    • @airraidsiren
      @airraidsiren Před 7 lety +2

      f(x,y) in the examples shown is only ever a mapping from 2D to 1D. It just happens, that when you have 3 values, it's nice to visualize it in 3D as the points [x, y, f(x,y)]. You could certainly use color as the 3rd dimension, you just need to provide a key, since it's not as clear what is meant as when you use a 3rd spatial dimension.

    • @jeffgalef121
      @jeffgalef121 Před 7 lety

      Thank you for the confirmation, Adam.

    • @airraidsiren
      @airraidsiren Před 7 lety

      You can also think about the surface as the solution to z=f(x,y)

    • @jeffgalef121
      @jeffgalef121 Před 7 lety

      Wouldn't that be the case if you integrated f(x,y)?

  • @eclipse-xl4ze
    @eclipse-xl4ze Před 7 lety

    kekek you are a god

  • @williambudd2850
    @williambudd2850 Před 5 lety +4

    Help!!! I think this guy just claimed that the direction of maximun change is in the direction of the gradient because it is in the direction of the gradient further confusing me.

    • @98danielray
      @98danielray Před 2 lety

      dude
      no
      the justification is using how the directional derivative was defined previously. pay attention

  • @guidogaggl4020
    @guidogaggl4020 Před 4 lety

    Is this grant from 3b1b?

  • @davidiswhat
    @davidiswhat Před 6 lety

    I'm still confused about why it is. I can see from the dot product formula that since the Directional Derivative is greatest and a positive number(due to taking absolute value of the gradient) that the gradient must represent the steepest ascent. I'm having trouble imagining the ascent part. Let's say the partial derivatives in respect to both x and y were both a negative value at a point. Wouldn't the Directional Derivative end up as a positive value and be referencing ascent still?

  • @Eng.Hamza-Kuwait
    @Eng.Hamza-Kuwait Před rokem

    👌👌👌👌👌👌

  • @dominicellis1867
    @dominicellis1867 Před 4 lety

    Does another change w

  • @cauchyschwarz3295
    @cauchyschwarz3295 Před 2 lety

    I find this fact so confusing. If the gradient is the direction of steepest ascent, what is the direction of greatest net change? I always assumed the gradient points in the direction where the function changes the most.

  • @timgoppelsroeder121
    @timgoppelsroeder121 Před 4 lety

    How can the gradient which is a vector dotted with the vector v equal the normalized version of the gradient vector???

    • @andrewmacarthur6063
      @andrewmacarthur6063 Před 3 lety +1

      I think this is a notational error at around 7:23 onwards. As written, the RHS of that equation should be a _number_ (the maximum value of grad f DOT v) rather than a _vector_ .
      What Grant has written is the unit vector v which gives rise to that maximum, i.e. the one that points in the same direction as the gradient but is of length 1.
      This has happened because Grant wants to emphasise this fact as the main point of the video. Some precision has been lost in the notation. (Using 'argmax' rather than 'max' on the LHS would make this precise but that might be less familiar and require explanation too.)

  • @jonathandobrowolski6941
    @jonathandobrowolski6941 Před 4 lety +1

    Yeah but why does the gradient point in the direction of steepest ascent? @ 5:18

  • @JoaoVitorBRgomes
    @JoaoVitorBRgomes Před 3 lety

    As a hiker I don't want the steepest ascent!

  • @niroshas1790
    @niroshas1790 Před 5 lety

    if possible real analysis

  • @zes7215
    @zes7215 Před 6 lety

    ts not frix or not, can telx anyx by anyx nmw. no such thing as howx telx

  • @rebeccap6609
    @rebeccap6609 Před 6 lety

    I understand why the gradient has the steepest slope of all the directional derivatives, but why can't it be in the direction of steepest DESCENT? Shouldn't there be a case where a pure step in x + a pure step in y lowers the value of the function?

    • @antonofka9018
      @antonofka9018 Před 6 lety +3

      Rebecca Peyser it's a little bit deeper. I didn't get it at first too. Now I'll try to explain (hopefully, I got it right)
      Suppose you have a little nudge in X that changes the value of the function in the negative direction. The gradient then encodes just that (the change to the function) as the first row, which is negative. See? You fed him a positive nudge and its output is negative. It means that you need to step in the negative direction in order to change the function the most. Ponder upon it for a moment.
      Now for the Y component. Suppose it changes the output more than it does the nudge in the X direction. So the second row of your gradient is going to be higher, since it reflects the change caused by your nudge.
      What you'd finally get as your gradient is a direction vector, that tells you to move in the negative direction for the X component and in positive for Y and change in X would be lower than the change in Y, since Y change affects the function stronger.
      It holds the information not of the direction of the first nudges, but the changes that those first nudges made to the function in the form of a new direction (a vector). If you get it now, then I'm jubilant. If you don't, try to ponder and write me a message. I'm open to explaining that again in more details.

    • @98danielray
      @98danielray Před 2 lety +1

      no, because the derivative being positive means the function is increasing. you could even say it is an arbitrary choice/convention to define the derivative as the limit of (f(x+h)-f(x))/h. it may as well have been (f(x)-f(x+h))/h which would make the derivative be positive mean the function is decrasing.

  • @abdijabesa8544
    @abdijabesa8544 Před 3 lety

    7:40
    isn't he suppose to say "multiplying it" rather than "dividing"?

  • @curtpiazza1688
    @curtpiazza1688 Před 9 měsíci

    😊

  • @hanju3250
    @hanju3250 Před 5 lety

    Is this video part of some course?

  • @actualBIAS
    @actualBIAS Před 6 měsíci

    This made it click for me.

  • @samirelzein1978
    @samirelzein1978 Před 4 lety

    at 7:41, you are dividing it down by "2" and not half, actually you get half of it.

    • @andrewramos5619
      @andrewramos5619 Před 3 lety

      Good thing you corrected him. Was very confused until now🙏🙏

  • @csmole1231
    @csmole1231 Před 4 lety

    I was initially confused because I was thinking of a situation where:
    along x axis and y axis the graph is kinda stable and mildly changing but in quadrant one there is a freaking big valley and ma poor little point is at the origin point😂
    I was worried that no info about that valley is shown in gradient and ma point don't know where to go😂
    then i realize i was outside the scope of this discussion

    • @csmole1231
      @csmole1231 Před 4 lety

      and at first i even totally ignored the fact that they are tiiiiiiiiiny steps, which means those steps happened in a little plane, not curvy at all

    • @csmole1231
      @csmole1231 Před 4 lety

      and ma point should follow the diagonal line hence the gradient direction