Markov Decision Processes 2 - Reinforcement Learning | Stanford CS221: AI (Autumn 2019)

Has Generative AI Already Peaked? - Computerphile

Markov Chains Clearly Explained! Part - 1

Which one is the best? #katebrush #shorts

$1 vs $300 Watergun!

Pozápasová tisková konference | OKTAGON 58

Markov Decision Processes 1 - Value Iteration | Stanford CS221: AI (Autumn 2019)

Stanford Online

zhlédnutí 428 282

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 9. 06. 2024
For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: stanford.io/3pUNqG7
Topics: MDP1, Search review, Project
Percy Liang, Associate Professor & Dorsa Sadigh, Assistant Professor - Stanford University
onlinehub.stanford.edu/
Associate Professor Percy Liang
Associate Professor of Computer Science and Statistics (courtesy)
profiles.stanford.edu/percy-l...
Assistant Professor Dorsa Sadigh
Assistant Professor in the Computer Science Department & Electrical Engineering Department
profiles.stanford.edu/dorsa-s...
To follow along with the course schedule and syllabus, visit:
stanford-cs221.github.io/autu...
Chapters:
0:00 intro
2:12 Course Plan
3:45 Applications
10:48 Rewards
18:46 Markov Decision process
19:33 Transitions
20:45 Transportation Example
29:28 What is a Solution?
30:58 Roadmap
36:36 Evaluating a policy: volcano crossing
37:38 Discounting
53:21 Policy evaluation computation
55:23 Complexity
57:10 Summary so far
#artificialintelligencecourse

Komentáře • 128

@nsubugakasozi7101 Před měsícem ⁺⁴
This lecturer is world class...and this is also the most confident live coding I have seen in a while...she is really really good. Universities are made by the lecturers...not so much the name
@vishalsunkapaka7247 Před 2 lety
professor is so talented can’t say anything just feared over her, can’t take anymore
@pirouzaan Před 8 měsíci ⁺³
this was by far the most impressive lecture with live coding that I had seen! I am leaving this virtual lecture room with awe and respect...
@iiilllii140 Před rokem ⁺⁵
Thank you for this lecture and the course order. The past lectures about search problems really help you to better understand MDPs.
@foufayyy Před 2 lety ⁺²⁸
thank you for posting this. MDPs were really confusing and this lecture really helped me understand it clearly.
@-isotope_k Před 2 lety
Yes this is very very confusing topic
@chanliang5725 Před 6 měsíci
I was lost on the MDP. Glad I find this awesome lecture clears all concepts in MDP! Very helpful!
@kazimsyed7367 Před 2 lety ⁺⁹
I wanna appreciate this lecture, its good. i had a difficult time and mental block for this topic. I wanna say thanks for all ur efforts.
@muheedmir7385 Před rokem ⁺⁵
Amazing lecture, loved every bit of it
@meharjeetsingh5256 Před 6 měsíci ⁺¹
this teacher is really really good. I wish you were at my Uni so that i could enjoy machine learning
@yesodabhargava8776 Před 2 lety ⁺²
This is an awesome lecture! Thank you so much.
@quannmtt3110 Před rokem ⁺¹
Thanks for the awesome lecture. Very good job at explanation by the lecturer.
@joshuat6124 Před měsícem
Thank you professor! I learnt to much from this, especially the live coding bits.
@user-bn3zw9sd1p Před rokem ⁺¹
It was my n-th iteration of MDP -where n>10 but using terminology of of MDP my knowlege finnally started to converge to proper direction. Thank you for the lecture🙂
@sukhjinderkumar2723 Před 2 lety ⁺²
Great Lecture, Thank you Professor :)
@adityanjsg99 Před rokem ⁺²
A thorough lecture!!
@ammaraboklam2487 Před 2 lety ⁺³
Thank you very much
This is really great lecture it's really helpful
@stanfordonline Před 2 lety
Hi Ammar, glad it was helpful! Thanks for your feedback
@vimukthirandika872 Před 2 lety ⁺⁶
Thank for amazing lecture!
@HarshvardhanKanthode Před 2 lety
Where are all the comments?
@snsacharya1737 Před rokem
At 29:36, a policy is defined as a one-to-one mapping from the state space to the action space; for example, the policy when we are in station-4 is to walk. This definition is different compated to the one made in the classic RL book by Sutton and Barto; they define a policy as "a mapping from states to probabilities of selecting each possible action." For example, the policy when we are in station-4 is a 40% chance of walking and 60% chance of taking the train. The policy evaluation algorithm that is presented in this lecture also ends up being slightly different by not looping over the possible actions. It is nice of the instructor to highlight that point at 55:45
@aojing Před 2 měsíci ⁺¹
Action is determined from the beginning independent of states in this class...This will mislead beginners to confuse Q and V, as by this definition @47:20. In RL, we take action by policy, which is random and can be learned/optimized by iterating through episodes, i.e., parallel worlds.
@marzmohammadi8739 Před 2 lety
لذت بردم خانم صدیق. کیف کردم .. مممنووونننن
@alphatensor Před 6 měsíci
Thanks for the good lecture
@alemayehutesfaye463 Před rokem
Thank you for your interesting lecture this lecture really helped me to understand it well.
@stanfordonline Před rokem
Hi Alemayehu, thanks for your comment! Nice to hear you enjoyed this lecture.
@alemayehutesfaye463 Před rokem
@@stanfordonline Thanks for your reply. I am following you from Ethiopia and had interest on the subject area. Would you mind in suggesting best texts and supporting video's which may be helpful to have in-depth knowledge in the areas of Markov Processes and decision making specially related to manufacturing industries?
@RojinaPanta1 Před 9 měsíci
would not removing constraint increase search space making computationally inefficent?
@seaotterlabs1685 Před rokem ⁺⁷
Amazing lecture! I was having trouble finding my footing on this topic and now I feel I have a good starting point of the concepts and notations! I hope Professor Sadigh teaches many more AI topics!
@stanfordonline Před rokem
Excellent, thanks for your feedback!
@ibenlhafid Před rokem
Mm
@ibenlhafid Před rokem
Mmmm
@ibenlhafid Před rokem
Pp
@ibenlhafid Před rokem
09
@carlosloria-saenz6760 Před 5 měsíci
Great videos, thanks!. At time 47:20 on the board a small typo, I guess it should be: V_{\pi}(s) = Q_{\pi}(s, \pi(s)) if s not the end state.
@thalaivarda Před 2 lety ⁺⁴
I will be conducting a test for those watching the video.
@farzanzeinali7398 Před rokem
The transportation example has a problem. The states are discrete. If you take the tram, the starting state equals 1, and with state*2, you will never end up in state=3. Let's assume the first action was successful, therefore, the next state is 2. If the second action is successful too, you will be end up in state = 4. you will never end up in state = 3.
@eigenfeynman9890 Před 2 lety ⁺⁷
FYI I'm a theoretical physics major, and I have no business in CS and whatsoever
@camerashysd7165 Před 22 dny
Wow this account crazy 😮
@msfallah Před rokem
I think the given definition for value-action function (Q(s, action)) is not correct. In fact value function is the summation of value-action functions over all actions.
@vikasshukla831 Před rokem
Can in the Dice Game If choose to stay for the step 1 and then quit in the second stage: will I get 10 dollars if I choose to quit in the stage 2? Because If I am lucky enough to go to second stage i.e the dice doesn't roll 1,2 then I am in the "In" state and by the diagram I have option to quit which might give me 10 dollar but for that I should have success in stage 1. Then the best strategy might change. Let know what are your comments?
@fahimullahkhan775 Před rokem
You are right according to the figure and flow of the states, but from the scenario ones get the perception that ones has a chance to either quit at the start or stay in the game.
@aojing Před 2 měsíci
@47:20 the definition of Q function is not right and confuses with Value function. Specifically, take immediate reward R out of summation. The reason is Q function is to estimate the value of a specific Action beginning with current State.
@aojing Před 2 měsíci
or we may say the Value function here is not properly defined without considering policy, i.e., by taking action independent of states.
@pythonmini7054 Před rokem ⁺²
Is it me or she looks like callie torres from grays anatomy 🤔
@henkjekel4081 Před rokem
U should look at andrew ng's lecture, he explains it way better
@rahulkelkar1246 Před 2 lety
Does anyone think she look like Zoe Kazan?
@dungeon1163 Před 2 lety ⁺⁵²
Only watching for educational purposes
@-isotope_k Před 2 lety ⁺⁴
😂😂
@divyanshuy007 Před rokem ⁺³
16:42 thumbnail
@aswinbiju4038 Před 2 lety ⁺¹¹
Only watching for educational purposes.
@soham4741 Před 2 lety
yes me too
@vikranthrana3019 Před 2 lety
Me too
@radheshyamshaw8672 Před 2 lety
Me too
@HolyRamanRajya Před rokem ⁺¹
Beauty and brainy.
@md.naimul8544 Před 5 měsíci
why is she so beautiful 😳😳
@ameerhamza4816 Před 4 měsíci
Why not?
@chamangupta4624 Před 2 lety
637
@harshraj3344 Před rokem
My man
@saisriteja5290 Před rokem ⁺¹
i love you
@sachinfulsunge9977 Před rokem ⁺²
Hell naw bruh
@vikranthrana3019 Před 2 lety ⁺¹⁵
Professor is quite cute ❤️
@buchhibabu7 Před rokem ⁺²
Cute lecture by cute lady
@asastudent682 Před rokem ⁺¹
I'm Indian and belongs to Bihar State 🇮🇳🇮🇳

Další v pořadí

Automatické přehrávání

Markov Decision Processes 2 - Reinforcement Learning | Stanford CS221: AI (Autumn 2019)

Markov Decision Processes 2 - Reinforcement Learning | Stanford CS221: AI (Autumn 2019)

Has Generative AI Already Peaked? - Computerphile

Has Generative AI Already Peaked? - Computerphile

Markov Chains Clearly Explained! Part - 1

Markov Chains Clearly Explained! Part - 1

Which one is the best? #katebrush #shorts

Which one is the best? #katebrush #shorts

$1 vs $300 Watergun!

$1 vs $300 Watergun!

Pozápasová tisková konference | OKTAGON 58

Pozápasová tisková konference | OKTAGON 58

Nia Jax attacks Jade Cargill in front of her daughter 😨

Nia Jax attacks Jade Cargill in front of her daughter 😨

Policy and Value Iteration

Policy and Value Iteration

CUDA Explained - Why Deep Learning uses GPUs

CUDA Explained - Why Deep Learning uses GPUs

Основные теоремы в теории игр - Алексей Савватеев на ПостНауке

Основные теоремы в теории игр — Алексей Савватеев на ПостНауке

Machine Learning, AI, and the Future of Education | Marc Natanagara | TEDxBrookdaleCommunityCollege

Machine Learning, AI, and the Future of Education | Marc Natanagara | TEDxBrookdaleCommunityCollege

The Most Important Algorithm in Machine Learning

The Most Important Algorithm in Machine Learning

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

Markov Decision Processes - Computerphile

Markov Decision Processes - Computerphile

Kluci jí moc rychle a nebo holky moc pomalu? 🤔 #komedie #sranda #ekv #shorts

Kluci jí moc rychle a nebo holky moc pomalu? 🤔 #komedie #sranda #ekv #shorts

MURADOV MUČÍ VÉMOLU 🥵

MURADOV MUČÍ VÉMOLU 🥵

CO UVIDÍ PŘÁTELÉ V DIANINĚ DOMĚ? | Vtipné dobrodružství Diany a spolužáků #shorts #ladydiana

CO UVIDÍ PŘÁTELÉ V DIANINĚ DOMĚ? | Vtipné dobrodružství Diany a spolužáků #shorts #ladydiana

You can now keep your hands clean, and your toilet cleaner...🚽 #toilet #cooltech #future

You can now keep your hands clean, and your toilet cleaner...🚽 #toilet #cooltech #future

Their reactions 😂 (via @trent.severino/TT)

Their reactions 😂 (via @trent.severino/TT)

Vémola vs. Végh 2 • OKTAGON 58 (celý zápas)

Vémola vs. Végh 2 • OKTAGON 58 (celý zápas)

フォーメーション付きで I WANNA BE YOUR SLAVE 踊ってみた♪ #shorts

フォーメーション付きで I WANNA BE YOUR SLAVE 踊ってみた♪ #shorts

Which one is the best? #katebrush #shorts

Which one is the best? #katebrush #shorts