Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

Deep RL Bootcamp Lecture 2: Sampling-based Approximations and Function Fitting

Proč Šáša Krusty vypadá jako Homer Simpson? #shorts #simpsons #homersimpson

Dynamic #gadgets for math genius! #maths

CAN YOU HELP ME? (ROAD TO 100 MLN!) #shorts

Deep RL Bootcamp Lecture 4A: Policy Gradients

AI Prism

zhlédnutí 59 299

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 28. 05. 2024
Instructor: Pieter Abbeel
Lecture 4A Deep RL Bootcamp Berkeley August 2017
Policy Gradients

Komentáře • 43

@naeemajilforoushan5784 Před měsícem ⁺²
After 5 years still, the lecture is a great video, thank you a lot
@jony7779 Před 4 lety ⁺¹⁶
Every time I forget how policy gradients work exactly, I just come back here and watch starting at 9:30
@andreasf.3930 Před 3 lety ⁺³
And every time you visited this video, you forgot where to start watching. Thats why you posted this comment. Smart guy!
@bhargav975 Před 6 lety ⁺³⁴
This is the best lecture I have seen on policy gradient methods. Thanks a lot.
@marloncajamarca2793 Před 6 lety ⁺³
Great Lecture!!!! Pieter's explanations are just a gem!
@auggiewilliams3565 Před 4 lety ⁺¹
I must say that in more than 6 months, this is by far the best lecture/ material I have come across that was able to make me understand what policy gradient method actually is. I really praise this work. :) Thank you.
@ericsteinberger4101 Před 6 lety ⁺⁹
Amazing lecture! Love how Pieter explains the math. super easy to understand.
@synthetic_paul Před 4 lety ⁺⁵
Honestly I can’t keep up without seeing what he’s pointing at. Gotta pause and search around the screen each time he says “this over here”
@akarshrastogi3682 Před 3 lety ⁺²
Exactly. "This over here" has got to be the most uttered phrase in this lecture. So frustrating.
@johnnylima1337 Před 6 lety ⁺⁵
It's such a good lecture, I'm stopping to ask myself why it was so easy to cover such significant information with full understanding
@ashishj2358 Před 3 lety
Best lecture on Policy Gradients hands down. Has covered some worth noting superficial details of many papers as well.
@faizanintech1909 Před 6 lety
Awesome instructor.
@dustinandrews89019 Před 6 lety ⁺¹
I got a lot out of this lecture in particular. Thank you.
@user-wp6lp3ec9q Před 4 lety
Very good lecture about policy gradient method. I have looked through a lot of articles and was understanding almost everything, but your derivation explanation is really the best. It just opened my eyes and showed the whole picture. Thank you very much!!
@bobsmithy3103 Před rokem
amazing work. super understandable, concise and information dense.
@DhruvMetha Před 3 lety
Wow, this is beautiful!
@nathanbittner8307 Před 6 lety
excellent lecture. Thank you for sharing.
@ethanjyx Před 5 lety
wow damn this is so well explained and the last video is very entertaining.
@norabelrose198 Před 2 lety
The explanation of the derivation of policy gradient is really nice and understandable here
@JadtheProdigy Před 5 lety
best lecturer in series
@suertem1 Před 4 lety
Great lecture, thanks
@sharmakartikeya Před 5 měsíci
I might be missing a simple concept here but how are we increasing/decreasing the grad log probability of the actions using the gradient of U(theta)? I get that positive return for a trajectory will make the gradient of U positive and so theta will be increased in favour of those trajectories but how is it increasing grad log prob?
@Diablothegeek Před 6 lety
Awesome!! Thanks
@ProfessionalTycoons Před 5 lety
great talk!
@biggeraaron Před 5 lety ⁺¹
Where can i buy his T-shirt?
@JyoPari Před 4 lety ⁺¹
Instead of having a baseline, why not make your reward function be negative for undesired scenarios and positive for good ones? Great lecture!
@keqiaoli4617 Před 3 lety ⁺¹
why a good "R" would increase the probability of path??? Please help me
@muratcan__22 Před 3 lety ⁺³
nice but hard to follow without knowing what "this" refers to. I hope my guesses were right :)
@emilterman6924 Před 5 lety
It would be nice to see what laboratories they had (what exercises)
@Procuste34iOSh Před 3 lety
dont know if ur still interested, but the labs are on the bootcamp website
@karthik-ex4dm Před 5 lety
PG is awesome!!!
Doesn't depend on environment Dynamics really?? Wow
All the pain and stress just goes away when we see our algorithms working😇😇
@ishfaqhaque1993 Před 4 lety
23:20- Gradient of expectation is expectation of gradient "under mild assumptions". What are those assumptions?
@joaogui1 Před 4 lety ⁺²
math.stackexchange.com/questions/12909/will-moving-differentiation-from-inside-to-outside-an-integral-change-the-resu
@shaz7163 Před 6 lety
very nice :)
@isupeene Před 3 lety ⁺²
The guy in the background at 51:30
@elzilcho222 Před 5 lety ⁺¹
could you train a robot for 2 weeks in the real world then use those trained parameters to optimize a virtual environment? You know.. making the virtual environment very close to the real world?
@OfficialYunas Před 5 lety ⁺¹
Of course you could. It's the opposite of what OpenAI does when they train a model in a virtual environment and deploy it in reality.
@soutrikband Před 5 lety
Real world is very complicated with model uncertainties, friction, wear and tear and what have you...
Simulators can come close , but we cannot expect them to fully mimic real world phenomena.
@richardteubner7364 Před 6 lety ⁺¹
1:11 why are DQNs and friends Dynamic Programming Methods? I mean the neural network works as functions approximator to satisfy Bellmans eqn. , but still Backprop is the workhorse. In my opinion DQNs are much more similar to PG methods than to Bellman Updates??! And another issue with RL Landscape slide is where the heck are model based RL algos?? This slide should be renamed to model free RL landscape.
@arpitgarg5172 Před 5 lety ⁺¹¹
If you can't explain it like Pieter Abbeel or Andrew NG then you don't understand it well enough.
@piyushjaininventor Před 5 lety
Can you share ppt??
@luxorska5143 Před 5 lety ⁺³
You can find all the slides and the other lectures here:
sites.google.com/view/deep-rl-bootcamp/lectures
@MarkoTintor Před 3 lety
... you can use "a", and the math will be the same. :)

Další v pořadí

Automatické přehrávání

Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

Deep RL Bootcamp Lecture 2: Sampling-based Approximations and Function Fitting

Deep RL Bootcamp Lecture 2: Sampling-based Approximations and Function Fitting

Proč Šáša Krusty vypadá jako Homer Simpson? #shorts #simpsons #homersimpson

Proč Šáša Krusty vypadá jako Homer Simpson? #shorts #simpsons #homersimpson

Dynamic #gadgets for math genius! #maths

Dynamic #gadgets for math genius! #maths

CAN YOU HELP ME? (ROAD TO 100 MLN!) #shorts

CAN YOU HELP ME? (ROAD TO 100 MLN!) #shorts

Highlights | Switzerland vs. Czechia | 2024 #MensWorlds

Highlights | Switzerland vs. Czechia | 2024 #MensWorlds

@Numberblocks- All the Sums | Learn to Add and Subtract

@Numberblocks- All the Sums | Learn to Add and Subtract

Artificial Intelligence, the History and Future - with Chris Bishop

Artificial Intelligence, the History and Future - with Chris Bishop

lofi hip hop radio 📚 - beats to relax/study to

lofi hip hop radio 📚 - beats to relax/study to

114. Communication Means Paying Attention: The Four Pillars of Active Listening

114. Communication Means Paying Attention: The Four Pillars of Active Listening

Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation

Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation

Generative AI + Education: Will Generative AI Transform Learning and Education

Generative AI + Education: Will Generative AI Transform Learning and Education

Policy Gradient Methods | Reinforcement Learning Part 6

Policy Gradient Methods | Reinforcement Learning Part 6

Leonard Susskind on Richard Feynman, the Holographic Principle, and Unanswered Questions in Physics

Leonard Susskind on Richard Feynman, the Holographic Principle, and Unanswered Questions in Physics

But what is a neural network? | Chapter 1, Deep learning

But what is a neural network? | Chapter 1, Deep learning

killing fleas on a dog🐶🐜❓gigachad in a frenzy💀

killing fleas on a dog🐶🐜❓gigachad in a frenzy💀

Dirt biker educates mountain bikers on trail 💯 part 1 @Meekerextreme

Dirt biker educates mountain bikers on trail 💯 part 1 @Meekerextreme

Proč Šáša Krusty vypadá jako Homer Simpson? #shorts #simpsons #homersimpson

Proč Šáša Krusty vypadá jako Homer Simpson? #shorts #simpsons #homersimpson

Cat story: from hate to love! 😻 #cat #cute #kitten

Cat story: from hate to love! 😻 #cat #cute #kitten

Augbert Invented Physics

Augbert Invented Physics

Japanese "Standing" Sushi

Japanese "Standing" Sushi

Footage Released of Moment Truck Pludges Off Clark Memorial Bridge in Louisville | 10 News First

Footage Released of Moment Truck Pludges Off Clark Memorial Bridge in Louisville | 10 News First