What is Automatic Differentiation?

Sdílet
Vložit
  • čas přidán 6. 06. 2024
  • This short tutorial covers the basics of automatic differentiation, a set of techniques that allow us to efficiently compute derivatives of functions implemented as programs. It is based in part on Baydin et al., 2018: Automatic Differentiation in Machine Learning: A Survey (arxiv.org/abs/1502.05767).
    Errata:
    At 6:23 in bottom right, it should be v̇6 = v̇5*v4 + v̇4*v5 (instead of "-").
    Additional references:
    Griewank & Walther, 2008: Evaluating Derivatives: Principles and Techniques
    of Algorithmic Differentiation (dl.acm.org/doi/book/10.5555/1...)
    Adams, 2018: COS 324 - Computing Gradients with Backpropagation (www.cs.princeton.edu/courses/...)
    Grosse, 2018: CSC 321 - Lecture 10: Automatic Differentiation (www.cs.toronto.edu/~rgrosse/c...)
    Pearlmutter, 1994: Fast exact multiplication by the Hessian (www.bcl.hamilton.ie/~barak/pap...)
    Alleviating memory requirements of reverse mode:
    Griewank & Walther, 2000: Algorithm 799: revolve: an
    implementation of checkpointing for the reverse or adjoint mode of computational differentiation (dl.acm.org/doi/10.1145/347837...)
    Dauvergne & Hascoët, 2006. The data-flow equations of checkpointing in
    reverse automatic differentiation (link.springer.com/chapter/10....)
    Chen, T et al., 2016: Training Deep Nets with Sublinear Memory Cost (arxiv.org/abs/1604.06174)
    Gruslys et al., 2016: Memory-efficient Backpropagation
    Through Time (arxiv.org/abs/1606.03401)
    Siskind & Pearlmutter. Divide-and-conquer checkpointing for arbitrary programs with no user annotation (arxiv.org/abs/1708.06799)
    Oktay et al., 2020: Randomized Automatic Differentiation (arxiv.org/abs/2007.10412)
    Example software libraries using various implementation routes:
    Source code transformation:
    Tangent - github.com/google/tangent
    Zygote - github.com/FluxML/Zygote.jl
    Operator overloading:
    Autograd - github.com/HIPS/autograd
    Jax - github.com/google/jax
    PyTorch - pytorch.org/
    Graph-based w/ embedding mini lanugage:
    TensorFlow - www.tensorflow.org
    Special thanks to Ryan Adams, Alex Beatson, Geoffrey Roeder, Greg Gundersen, and Deniz Oktay for feedback on this video.
    Some of the animations in this video were created with 3Blue1Brown's manim library (github.com/3b1b/manim).
    Music: Trinkets by Vincent Rubinetti
    Links:
    CZcams: / ariseffai
    Twitter: / ari_seff
    Homepage: www.ariseff.com
    If you'd like to help support the channel (completely optional), you can donate a cup of coffee via the following:
    Venmo: venmo.com/ariseff
    PayPal: www.paypal.me/ariseff
  • Věda a technologie

Komentáře • 104

  • @anjelpatel7918
    @anjelpatel7918 Před 2 lety +202

    I like how more and more people are adopting 3b1b's style. Makes the content much better and easier to understand. This slowly converts a lot of the more complicated topics into easy-to-digest modules.

    • @Artaxerxes.
      @Artaxerxes. Před 2 lety +5

      It literally uses manim

    • @tomerzilbershtein849
      @tomerzilbershtein849 Před 2 lety +11

      3B1B’s creator Grant Sanderson created an animation library for himself to use to make videos. People forked that library (made a copy of it) and now there is a community supported version of it for creators, while he continues to use his own ( as well as the community one). Pretty cool stuff!

    • @atotoole21
      @atotoole21 Před rokem

      @@Artaxerxes. Nice! I didn't know about manim or that 3B1B's animation technic was python based. I assumed it was done by hand using Illustrator or something.

    • @umbraemilitos
      @umbraemilitos Před 10 měsíci +1

      Yes, though I don't think 3B1B wants their videos to be a template to copy. I think he's happy to inspire, but doesn't think that his Manim program is the right tool for most cases. He released a video explaining the SOME criteria, and it allows for lots of creative expression in teaching.

    • @andreypopov6166
      @andreypopov6166 Před měsícem

      3b1b or any other style on its own doesn't mean that the content is easier to understand.

  • @stathius
    @stathius Před rokem +5

    Class act, being concise and clear at the same time is no easy feat. Thank you.

  • @andrewbeatty5912
    @andrewbeatty5912 Před 3 lety +25

    Best summary I've ever seen !

  • @arkasaha4412
    @arkasaha4412 Před 3 lety +41

    Man this is pure gold. We all use this stuff but hardly have a clear idea about it's nitty-gritties. Thanks for thre awesome content and presentation, keep it up! :)

  • @raminbohlouli1969
    @raminbohlouli1969 Před 9 měsíci +5

    I knew basically 0 about AD and didn't know where to start since all the articles, websites ,books etc that I have looked into, explained everything in a really comlicated way. I would like to thank you immensly for this very informative yet simple video! Now I know enough to dive deeper into the concept. This video was all I needed. Keep up the great work! You got yourself a new follower.

  • @abhishek.shenoy
    @abhishek.shenoy Před 3 lety +7

    This is so well explained! I love the quality of your videos!

  • @jaf7979
    @jaf7979 Před rokem +2

    Well done, superbly explained in context of other differentiation methods. Exactly what I needed!

  • @koushik7604
    @koushik7604 Před rokem

    This is highly motivated by Andrej Karpathy's lecture, but very clear explanation. It is indeed a good addition to my resource list.

  • @TheLokiGT
    @TheLokiGT Před rokem

    Very good job. One of the very few good videos I've seen around about autodiff.

  • @esaliya
    @esaliya Před 3 lety

    This is a neat summary that's hard to find in a single place!

  • @pandatory1108
    @pandatory1108 Před 3 lety +6

    Excellent video Ari. Thanks for such a great explanation!
    Also, your animations were really well done. I suspected you might be using manim based on the style and then I read the description :)

  • @pulusound
    @pulusound Před 3 lety

    very well explained video with lovely calm background music. i need to brush up on my vector calculus and come back but this gave me a good intuition. hope you make more of these!

  • @jorgeanicama8625
    @jorgeanicama8625 Před rokem +2

    Thank you Ari. I used symbolic computation in the past but this novel way of calculating derivatives is quite interesting. Learnt lots by watching your video. For sure, I will follow up with the recommended literature

  • @BrianAmedee
    @BrianAmedee Před 3 lety +2

    Excellent presentation mate. That was an awesome explanation and a nice trip down memory lane (university days).

  • @chandank5266
    @chandank5266 Před 3 lety +7

    Your way of explanation is outstanding.....love from india sir♥️

  • @YorkiePP
    @YorkiePP Před 3 lety

    Fantastic video on autodiff, really cleared up a lot of things I wasn't sure about.

  • @SohailKhan-zb5td
    @SohailKhan-zb5td Před rokem

    Thanks a lot. This kind of videos are really a lot of hardwork to produce. Thanks a lot

  • @prydt
    @prydt Před 3 lety

    Amazing explanation of Autograd and wonderful visualizations!!! Thank you so much.

  • @jkkang9666
    @jkkang9666 Před 3 lety +2

    Thanks for the great summary and the nice video.

  • @aldaszarnauskas27
    @aldaszarnauskas27 Před rokem

    Great video, well presented, clearly explained, nice visualisation... Thank you!

  • @Roshan-xd5tl
    @Roshan-xd5tl Před 2 lety

    Brilliant video, Ari. Thank you!

  • @asdf56790
    @asdf56790 Před rokem

    Exactly what I was looking for! Thank you :)

  • @weinansun9321
    @weinansun9321 Před 3 lety +2

    more videos please, this is amazing!

  • @user-kl1xv8in2q
    @user-kl1xv8in2q Před 2 lety

    Thanks you so much. This video really helps me to understand a little more what is automatic differentiation is.

  • @arnold-pdev
    @arnold-pdev Před 2 lety

    Went from complete ignorance to understanding in 15 min. Thank you!

  • @ccgarciab
    @ccgarciab Před 3 lety +1

    Looking forward to your future videos

  • @AJ-et3vf
    @AJ-et3vf Před 2 lety

    Awesome presentation! I understand autodiff a little bit more. I'll rewatch several more times in the future to understand it better till I completely understand it :)

  • @Vaporizer41
    @Vaporizer41 Před 3 lety

    Great video!, I love your content, hope you will keep making many more :)

  • @halneufmille
    @halneufmille Před 3 lety

    Thanks! I never understood this before, but it became obvious in one second.

  • @KulvinderSingh-pm7cr
    @KulvinderSingh-pm7cr Před rokem

    This is exceptionally well explained.

  • @VHenrik007
    @VHenrik007 Před 12 dny

    Just as a note for anyone wondering, the arxiv link doesn't work because it includes the closing parenthesis. Otherwise great video!

  • @BrianBin
    @BrianBin Před 7 měsíci

    I like your tutorial video because it is short
    and good

  • @datamike7457
    @datamike7457 Před 3 lety +8

    Ari, this is great content! I used to call symbolic differentiation 'analytical'. It is obnoxious to track all of the coefficients.

  • @setsunakevin6861
    @setsunakevin6861 Před 3 lety

    Amazing video! Very well explained.

  • @thivinanandh4430
    @thivinanandh4430 Před 2 lety

    Awesome Explanation..!!!!!
    Keep rocking..!!!

  • @amadlover
    @amadlover Před rokem

    timely information about source code manipulation and google tangent. It was a kind of confirmation for me that it was indeed possible.
    I started to learn meta programming hoping to generate code for the differentials, based on the function, without actually knowing if it was possible., basically a shot in the dark.
    cheers

  • @sandropollastrini2707
    @sandropollastrini2707 Před 2 lety

    Beautiful and clear!

  • @nathanielscreativecollecti6392

    Bravo! I have a final today and now I get it!

  • @jishnuak3000
    @jishnuak3000 Před rokem

    Very intuitive explanation, thanks

  • @vijaymaraviya9443
    @vijaymaraviya9443 Před 3 lety

    Awesome summary👌

  • @andersgadlauridsen1533

    So is so great content, please keep making more :)

  • @hadik4497
    @hadik4497 Před 3 lety

    Thanks! This is phenomenal!

  • @tom_verlaine_again
    @tom_verlaine_again Před 2 lety

    Great lesson! Thank you.

  • @kong1397
    @kong1397 Před 3 lety

    Wow, that's great explanation.

  • @SuperDonalByrne
    @SuperDonalByrne Před 4 měsíci

    Great video!

  • @stansilverman1901
    @stansilverman1901 Před 3 lety +1

    In order to explain this to my wife, I differentiated voter rights-the analog process humans decide who should be allowed to vote, someone who looks like me, or everyone?. I think she got it. Brilliant Ari

  • @juandavidnavarro
    @juandavidnavarro Před 11 měsíci

    Excellent video!! thank you so much. I have a question: is there any AD reverse mode based on dual numbers?

  • @manumerous
    @manumerous Před 2 lety

    This video is genius! love it.

  • @tom-sz
    @tom-sz Před měsícem

    Great video! Where can I learn more about the rounding and truncation errors plot at 2:06? I need to make an analysis of these errors for a project. Thanks :)

  • @bitahasheminezhad2887
    @bitahasheminezhad2887 Před 3 lety

    That was awesome, thank you

  • @amirrezarezayan8121
    @amirrezarezayan8121 Před 18 dny

    great great great , Thanks a million 😃

  • @softerseltzer
    @softerseltzer Před 3 lety +1

    Love it!

  • @jianwang7433
    @jianwang7433 Před 2 lety

    thanks for sharing

  • @bryanbischof4351
    @bryanbischof4351 Před 3 lety +3

    This is quite good. I’m wondering if a part 2 digging deeper yet into how the implementation takes advantage of the concept you introduce here would be possible?

    • @ariseffai
      @ariseffai  Před 3 lety +1

      Thanks Bryan. That's a possibility. It would certainly be interesting to dig deeper into the implementation schemes, which were only briefly described here. In the meantime, check out some of the links for further information on implementations.

  • @jbl4174
    @jbl4174 Před 2 lety +1

    Thanks for putting out such a great video. Im still a bit confused why forward mode AD requires a separate forward pass for each input variable. In Bayden et al. it says "Conversely, in the other
    extreme of f : R^n → R, forward mode AD requires n evaluations to compute the gradient". But I dont see why you couldnt compute the primal table and then the tangent table for each n variables, unless "n evaluations" means n evaluations of the tangent table and not forward passes.

  • @sofa33
    @sofa33 Před 2 lety

    Thank you so much!

  • @alfcnz
    @alfcnz Před 2 měsíci

    @Ari, this is really great! 🤩🤩🤩

  • @newbie8051
    @newbie8051 Před rokem

    Beautiful video but I lost track quite a few times, is there any pre-requisite topics/stuff I should know before trying to understand this

  • @rtcoffee1235
    @rtcoffee1235 Před 3 lety

    thanks for this!

  • @superagucova
    @superagucova Před 3 lety

    Loved this video! Are you using 3b1b's Manim?

  • @sirallen2591
    @sirallen2591 Před rokem

    Thanks!

  • @dullyvampir83
    @dullyvampir83 Před 5 měsíci

    Great video, thank you!
    Just a question, you said a main problem with symbolic differentiation is that no control flow operations can be part of the function. Is that in any way different for Automatic differentiation?

  • @ktugee
    @ktugee Před rokem

    slight type : @6.29 : v6' = v5'v4 + v4'v5. ( there should a + instead of - )

  • @GordonWade-kw2gj
    @GordonWade-kw2gj Před měsícem

    Wonderful video. The detailed example helps tremendously.
    And I think there's an error: At t=6.24, sInce $v_6 = v_5\times v_4$, in $\dot{v}_6$ shouldn't there be a plus sign where you've got a minus sign?

  • @germangonzalez3063
    @germangonzalez3063 Před 3 lety

    Very useful

  • @gabrielmccartney7975
    @gabrielmccartney7975 Před 2 lety

    Hello! Can we use dual numbers for integration?

  • @UnnamedThe
    @UnnamedThe Před 3 lety

    12:26
    May I ask where you got that c

    • @ariseffai
      @ariseffai  Před 3 lety +1

      Baydin (arxiv.org/abs/1502.05767) references this bound in Sec. 3.2. I don't have the exact location for it in Griewank and Walther.

    • @UnnamedThe
      @UnnamedThe Před 3 lety +1

      @@ariseffai Thank you a lot! That is already very helpful.

  • @PahenPWNZ
    @PahenPWNZ Před 3 lety +1

    Awesome explanation, thanks!
    But I still have one question, can someone explain please, at 12:05, right column (Adjoints)
    I don't understand how did we get these values (f. e. v bar 5 = v4 * v bar 6, etc...) From where did these values come from?
    If we use the formula at the previous slide with sum of children nodes, I get different values..

    • @MarkKrebs
      @MarkKrebs Před 2 lety

      Hi I have same Q. The moment when adjoints are defined is a break to me. vbar5 = v4 * vbar6 seems "backwards." I see it matches the formula given on the prior graph page, but not the intuition for it. "The sum of the output values, weighted by my leverage in creating them," is as close as I can get.

    • @abhaysolanki9284
      @abhaysolanki9284 Před 2 lety +1

      I know when he said children I automatically thought of v3 and v4. But instead the children in the case v5 is only v6. And children for v4 are v5 and v6. Children are the nodes that the node is pointing to.

  • @chnlior
    @chnlior Před 3 lety

    Great summary, Ari. Thank you.
    I think there is small error in 6:23. v6' = v5'v4 + v4'v5 and not "-".

    • @ariseffai
      @ariseffai  Před 3 lety

      Thanks Lior, good catch-placed this under errata.

  • @user-vm9hl3gl5h
    @user-vm9hl3gl5h Před rokem

    어쨌든 요점은, 모든 것을 다 closed form으로 저장해서 gradient를 매번 구하는 게 아니라는 점이다. 한 번 계산할 때마다, output value와 더불어 gradient value도 함께 계산해두어, 나중에 forward / backward 할 때 사용한다.

  • @jorgeanicama8625
    @jorgeanicama8625 Před rokem

    One more note ARI. I think there is a small typo. From minute 7:36 until 7:46 the derivative of V6 should be a "+" instead of a "-".

  • @9888622400
    @9888622400 Před 2 lety

    thanks bro!

  • @paulpassek6118
    @paulpassek6118 Před 3 lety +2

    Thanks for the superb video. I think you made a little mistake in the forward mode example at 6:24. Shouldn't it be v̇_6 = v̇_5*v_4 + v̇_4*v5 ?

    • @ariseffai
      @ariseffai  Před 3 lety

      Thanks Paul, good catch-placed this under errata.

  • @deepanshuchoudhary4598
    @deepanshuchoudhary4598 Před 3 lety +1

    Please reply to my Question.
    Where do you learn these and how are you able to grasp them completely, I'm a data science student and i need to know it badly. Pls share insights.

    • @ariseffai
      @ariseffai  Před 3 lety +1

      I found the survey by Baydin et al. to be particularly helpful. See the description for links!

  • @rachelellis6655
    @rachelellis6655 Před rokem

    Derivative at 0:43 would actually be: f' (x) = (2x)e^(2x-1)- 3x^2 ... would it not?
    Great video.. I've subscribed! I'm just learning derivative and chain rule so I want to be sure I'm understanding the concept/rules/procedures correctly. I'm probably wrong though, that's why I'm asking for verification... thanks!

  • @proweiqi
    @proweiqi Před 3 lety +2

    this is very good. but some of the stuff moves too fast and not explaining things like the primal part clearly enough

  • @diodin8587
    @diodin8587 Před 2 lety

    not mention *dual number*?

  • @zappist751
    @zappist751 Před rokem

    THANK YOU LORD THANK YOU JESUS AND THANK YOU SIR

  • @bokibogi
    @bokibogi Před rokem

    4:27 automatic differentiation ...

  • @yavarjn2055
    @yavarjn2055 Před rokem

    Wooow

  • @Rems766
    @Rems766 Před 2 lety

    chain rule rules

  • @Manishsingh-dl6ho
    @Manishsingh-dl6ho Před 3 lety

    Fking Great!!!

  • @sarvasvarora
    @sarvasvarora Před 3 lety +1

    Reddit gang?

  • @user-rr7uz9hd4m
    @user-rr7uz9hd4m Před 2 lety

    Do you get paid to make such videos? Definitely should

  • @MariaFernandez-pv9hn
    @MariaFernandez-pv9hn Před 3 lety

    You should point on the screen what you are talking about when doing examples.

  • @maxyazhbin826
    @maxyazhbin826 Před 3 lety +1

    please no music, fantastic otherwise

  • @ollllj
    @ollllj Před 6 měsíci

    on expression-swell:
    one of my proudest computations (and hard to debug code) is the automated differentiation 3rd derivative of the general quotient rule within [shadertoy ... /WdGfRw ReTrAdUi39] , with identical parts already pre-multiplied out by how much it is constantly repeated.
    webgl code:
    Struct d000{float a;float b;float c;float d;};//1 domains t,dt,dt²,dt³ , sure, this could just be a vec4, but i REALLY needed my custom labels for debugging.
    d000 di(d000 a,d000 b){return d000( //autodiff up to 3 derivatives for division , up to 3 iterations of; quotient rule within chain rule)
    a.a/b.a //0th derivative, simple division
    ,(a.b*b.a-a.a*b.b)/(b.a*b.a) //dx first derivative
    ,((a.c*b.a+a.b*b.b-a.b*b.b-a.a*b.c)*(b.a*b.a)-2.*(a.b*b.a-a.a*b.b)*(b.a*b.b))/(b.a*b.a*b.a*b.a) //dxdx second derivative
    ,((((a.d*b.a+a.c*b.b+a.c*b.b+a.b*b.c-a.c*b.b-a.b*b.c-a.b*b.c-a.a*b.d)*(b.a*b.a)
    +(a.c*b.a+a.b*b.b-a.b*b.b-a.a*b.c)*(b.b*b.a*b.a*b.b))
    +(-2.*(a.c*b.a+a.b*b.b-a.b*b.b-a.a*b.c)*(b.a*b.b)
    +(a.b*b.a-a.a*b.b)*(b.b*b.b+b.a*b.c)))*(b.a*b.a*b.a*b.a)
    -((a.c*b.a+a.b*b.b-a.b*b.b-a.a*b.c)*(b.a*b.a)
    -2.*(a.b*b.a-a.a*b.b)*(b.a*b.b))
    *4.*(b.b*b.a*b.a*b.a))/(b.a*b.a*b.a*b.a*b.a*b.a*b.a*b.a)) //dxdxdx //3rd derivative quotient rule sure is something
    ;}

  • @a.osethkin55
    @a.osethkin55 Před 2 lety

    Thanks!!!