Q Learning Explained (tutorial)

Siraj Raval

zhlédnutí 330 842

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 11. 09. 2024
Can we train an AI to complete it's objective in a video game world without needing to build a model of the world before hand? The answer is yes using Q learning! I'll go through several use cases and show some python code of how Q learning works.
Code for this video:
github.com/llS...
Adnan's Winning code:
github.com/Adn...
Alberto's runner up code:
github.com/alb...
Please Subscribe! And like. And comment. That's what keeps me going.
Want more inspiration & education? Connect with me:
Twitter: / sirajraval
Facebook: / sirajology
More learning resources:
mnemstudio.org/...
ocw.mit.edu/co...
uhaweb.hartford...
/ deep-reinforcement-lea...
www.cs.cmu.edu...
cs.stanford.edu...
www.quora.com/...
www0.cs.ucl.ac....
Join us in the Wizards Slack channel:
wizards.herokua...
And please support me on Patreon:
www.patreon.co... Instagram: / sirajraval Instagram: / sirajraval
Signup for my newsletter for exciting updates in the field of AI:
goo.gl/FZzJ5w
Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available):
www.wagergpt.co

Komentáře • 260

@Sluppie Před 5 lety ⁺³³
Always love it when a programming-related video starts with "Hello, World."
Not even kidding.
@ImtithalSaeed Před 3 lety
exactly
@amidhmi5243 Před 6 lety ⁺¹¹
It's a delicate balance between efficiency and curiosity
@KunwarPratapSingh41951 Před 6 lety ⁺⁸
Hey Siraj, I want to thank you for bringing such epic videos on AI and mathematics to inherit hunger for learning. *A good teacher is one which ignites a spark in student* ...Love from India.😂😢🤓❤
@SirajRaval Před 6 lety ⁺¹
sending hugs
@adityapatil325 Před 6 lety ⁺²⁵⁶
I think I should unsubscribe until I learn enough Deep Learning, as these videos are giving me existential crisis.
@stefan-ls7yd Před 6 lety
Aditya Patil 😂
@greenbillugaming2781 Před 6 lety
Aditya Patil same here bro :(
@SirajRaval Před 6 lety ⁺⁵³
nah dont worry im just going to get better at explaining
@stefan-ls7yd Před 6 lety
Siraj Raval +100 IOTA
@hdef6602 Před 6 lety
yup
@CodeEmporium Před 6 lety
I have an AI test next week. It's like you uploaded this for me. Thanks!
@manuelkarner8746 Před 5 lety ⁺³
love your videos great Work buddy ! i am starting to study AI next week and I am freaking excited as well as confident because your channel (and brilliant and other stuff) is increadibly helpful :D
@waltzofthestars2078 Před 2 lety
omgod besides the video being really well made and offering quality explanation(kudos!) you are the first guy from India with a totally nice pronunciation!
@slugfiller Před 6 lety ⁺²
This video misses an important issue with Q learning. The Q function is based on possibly getting a reward in the future, even if a reward is not available right away. If the algorithm keeps getting into new states, it might posit that the lack of rewards are simply a case of it getting closer to a large reward in the far future. It won't know to "correct the value down" until it loops back to (and finds itself unable to escape) a previous state. It can become "stuck" with that bias, so long as it's not sure it's in a closed state list. The more states are in the system, the larger this bias can become.
@smtabatabaie Před 6 lety ⁺¹
Man these reinforcement learning series are amazing !
@talkohavy Před 4 lety ⁺¹⁰
I watched this video TWICE !!!
My first time: With zero background, I understood nothing!
I Rage quit in the middle.
*A full semester passes by in which I took Deep Reinforcement Learning class *
My second time: Oh...! So that's what he was talking about!
I guess there's a special lingo that needs to be learnt, and making a 10 minute youtube video about it is absolute pointless, cause the ones with zero background won't understand jack shit, and the ones with background already know this shit. So... you know.
@saveerjain8168 Před 4 lety
I have background in other types of ai, yet I want to learn q learning. I know the lingo but don’t know this. Boom, it’s helpful.
@siriwessberg7460 Před 4 lety ⁺¹
Me and my study partner literally refer to you as "our friend Siraj" when we are studying ML because you never fail to help us understand all the concepts that were unclear before
@aaronsilver-pell411 Před 6 lety ⁺²
This is helping me to understand life strategies, lmao. Great video Siraj.
@WesKeppy Před 6 lety ⁺¹⁵
"Of course you are, you beautiful wizard."
@SirajRaval Před 6 lety ⁺⁴
haha its true
@philrowlands1087 Před rokem
Except for the use of ‘less’ rather than ‘fewer’ but I still think you’re amazing
@yabincheng4171 Před 9 měsíci
I think key of q-learning is to mimic human's leanring that we always learn under some motivation, otherwise we could n't be too good at anything. The algorithm introduce result of action as feedback into shaping decision making
@GameCasters Před 6 lety ⁺⁴
where can i find a simpler video? most of the terms he uses, i don't understand
@sadeghshaikhi5950 Před 3 lety ⁺⁴
i wasn't able to get anything from this
@tallwaters9708 Před 6 lety ⁺¹
With all this talk about agents you should consider perhaps doing an extensive video on Multi-Agent Systems. Jason etc...
@nfcopier1 Před 6 lety
Siraj, you understand computer science far more deeply than I do. But I think you need to review clean coding practices. For developing an algorithm, it might not be a big deal. But if you want to share your code with others - for distribution, review, or learning - readable code will make the process much smoother.
@rhejamphi Před 6 lety ⁺²⁴
Someone has a supply of NZT.
@DriftyG Před 6 lety ⁺²⁷
Great video Siraj, thanks for bringing your knowledge into the video game world! :)
@SirajRaval Před 6 lety ⁺²
no problem driftwood this is fun!
@dsuryas Před 6 lety ⁺⁵³
Man, how do you gather so much information so quickly???😱
@dan-garden Před 6 lety ⁺¹²
Deepak Surya AI
@dan-garden Před 6 lety ⁺³
Def Tank Not sure but probably :P
@mooe20 Před 6 lety ⁺⁶
Deep learning man...
@dsuryas Před 6 lety
AJ J that'll be crazy fast 😂😂
@g0d182 Před 6 lety
Internet
@bauwndule Před 6 lety ⁺⁷
Man, the AI community can't thank you enough!
@SirajRaval Před 6 lety ⁺²
thanks for watching!
@prinkle12 Před 6 lety ⁺⁵
Hi Siraj, Thanks for introducing and teaching such great stuff every week. These are of great help. Recently I was studying about agent-critic reinforcement learning and I found its methodology quite similar to GAN where agent performing the role of generator and critic as the discriminator. Can you please provide your thoughts on this?
@AhmadM-on-Google Před 6 lety
Okay Siraj these explanations were amazing... intuitive and easily absorbed !
Nice effects too xD
@AhmadM-on-Google Před 6 lety
Damn why u getting dislikes tho. Anyways what you think about creating a general game bot to rekk all online scores
@cash4laughs71 Před 6 lety
Best teacher ever.
@kalebbruwer Před 6 lety ⁺²
I get the concepts of quite a few different models now(RNN,CNN,normal feed worward), but I have trouble putting it into code. Can you please point me to a good resource to learn tensorflow itself from, please?
@trevorgustavgreen8148 Před 5 lety
Awesome, I found a hidden gem on youtube
@saurabhiim Před 6 lety
Hi Siraj, I request that you make your videos in such a manner where any layman can understand the logic behind the same ...I know its sometimes very tough by knowledge is that only ...to make the complex things easy ....cheers
@yabincheng4171 Před 9 měsíci
What's 'Q'? The 'q' in q-learning stands for quality. Quality in this case represents how useful a given action is in gaining some future reward.
@paviad Před 6 lety ⁺²
Hey Siraj, great video (and your other videos are also great). I got a question, what happens if it takes a long time to complete an "episode"? How do you efficiently train the network in that case?
@BigDvsRL Před 5 lety ⁺⁸
Nice :) Hope this will help me create an AI which solves "Plague Inc" xDDD
@Sohlstyce Před 4 lety
Nah teach it to infect Greenland first
@CTimmerman Před 6 lety
TD is nice for Mario, because jumping a Goomba has little influence on the rest of a level, but MC is great for Go, where early moves are important later.
@enobil Před 6 lety
For mario, think about the state space. Then think about markov chain approach that needs exact match of the state. I hope you get my point that no one is going to have enough ram and time for it to train for mario. You can downsample as you can but still the approach is not allowing a reasonably sized state space.
@jinxblaze Před 6 lety
i did this for my college project :) thanks siraj
@vitulus_ Před 5 lety ⁺²
4:05 "it's called the fuck you function" - that's what I heard
@AlexeyKravets Před 4 lety ⁺¹
Why are there no subtitles, at least in English? Are you not interested in foreign listeners?
@UnboxingSve Před 6 lety
Amazing work Siraj!
@otonanoC Před 5 lety ⁺¹
Why is Siraj in the kitchen? Is he about to show us how to cook something?
@aliazizi129 Před 6 lety ⁺¹⁴
U do really great job Siraj,but ur videos look like just a lecture and no effective learning just some informations that flooded all over the internet.it would be better to explain exactly what u do on a example code and explain the entire code step by step.i hope U really do it i have so many unsolved questions that no one can answer it if u explain Ur code for this video and other videos like this U will help us so much.
thx alot man.
@keanzoe Před 6 lety
the title said "Q Learning Explained" not how to make Q Learning..you have to know the difference
@bobsmithy3103 Před 6 lety ⁺⁶
I've found that for the 50 or so videos of his I've watched over the past year and a half, I have learnt absolutely nothing. I've probably wasted 50-100 hours on his channel, just following along, rewatching stuff, trying to understand stuff, but never actually understanding. Most of the time, watching his videos gave me the impression that I was learning when in actuality I wasn't learning crap. I can't even recall anything that I learnt from watching any of his videos this video included. Most of what he says is just lost in the jaron and at times I even feel like he's not even trying to teach but just to give the illusion of teaching others. His videos will probably be helpful for those that already know what he's talking about or are already deep in the field, but for individuals that are just starting off on this ml journey, I doubt they'll learn much from him.
@subjord Před 5 lety ⁺¹
@@bobsmithy3103 You need to do the coding. You won't learn programming by watching stuff.
@Ludens93 Před 9 měsíci
I know Q learning is a type of reinforcement learning. But I'm wondering if adding human feedback in the loop makes the model more accurate and less prone to mistakes.
@zakarie Před 6 lety
Great siraj, keep them coming
@khiljichand Před 6 lety ⁺²
Love your videos!
Thanks so much for making Machine Learning interesting. :D
@SirajRaval Před 6 lety
thanks!
@alberjumper Před 6 lety
Great video! Q Learning FTW!
And thanks for the shoutout :D
@SirajRaval Před 6 lety
np alberto
@nuadathesilverhand3563 Před 5 lety ⁺²
Dude, can you slow down a little so that I don't have to hear you gasping for air? Or so that I can figure out what you're saying?
@zachwhelpley661 Před 4 lety ⁺¹
Dude, you're allowed to blink haha
@aksjhdbaksjhdbNotASpam Před 7 měsíci
Great easily understandable video!
@Palamdrone Před 6 lety ⁺¹
Hey Siraj, thanks for the video. I have a probably naive question. With Q learning, an observation and score is given to an agent by the environment. Does this mean that q learning requires an environment that is perfectly informed? For gym, goals are clearly defined to the world such as getting to a destination. What if the goal is not so well defined?
Example: What if I want to use q learning for exploration of a fixed geometry environment with the goal of finding resources that are not immediately know to the environment. Now it's unclear to me as how to define the score for each frame as the environment would not know about where the resources are in the first place until the agent is close enough to spot it.
Sorry for the lengthy comment!!! I understand you are very busy and my comment might be extremely dumb so any help is greatly appreciated! Thanks again!
@AfdalWahyu Před 6 lety
Hi siraj, great video.
btw you should change your mic to more high quality mic. as headphone user i'm not comfortably enough watching your videos i still can hear noise in the background. anyway keep the good work
@prateek6502-y4p Před 5 lety ⁺²
How do u expect one to grasp everything if u explain with the light speed
@BillBaxter Před 4 lety
Playback speed 0.75x. I’m serious.
@EickSternhagen Před 4 lety
Quality of a certain action in a certain state. Bellman equation. Algorithm.
@LawrenceDCodes. Před 9 měsíci ⁺⁹
Here in 2023 because ... reasons
@Nick_With_A_Stick Před 4 měsíci
Just so happens we all have the same reason 😉
@andriibogomazov7863 Před 6 lety
Nice kitchen background, but the plain background with memes are easier to watch and focus... btw did you slow down the video by 10-20% ?)
@Yannoux2000 Před 6 lety
that helped me out understanding my issue with exploration thx.
@alljiang Před 6 lety ⁺⁹⁰
27 liberal arts majors watched this video
@debayondharchowdhury2680 Před 5 lety
163 now.
@nizamuddinahmed8913 Před 4 lety
204 man
@JohnDoe-uq2qd Před 4 lety
237
@Cyphlix Před 6 lety
0:49 this is why I subbed
@daesoolee1083 Před 6 lety ⁺¹
Wow. You're really really good at explaining things in a super easy and fun way :) Amazing video! I love it!
@adamwespiser9209 Před 6 lety
Surprisingly good....hmmmmm. Great job!
@samacumen Před 6 lety
Hi Siraj. Thanks for the video. Can you tell me what tools do you use to edit those awesome videos? Thanks.
@Mirandorl Před 5 lety
No one ever seems to talk about how the agent knows which actions are available to it. Where are the options "left and right" defined?
@precogtyrant Před 6 lety
much better than your earlier ones. Pace is good and there're less memes and gimmicks. Good job!
@shantomathew-fh3hv Před 6 lety
Thanks for doing this Siraj. But I am running this in Ubuntu. I am not able to see anything though the code runs fine as i can see the iterations. Any idea how to fix it.
@dmarsub Před 6 lety
4:20 q funktions seem great for speedrunning, but I wonder if there is only limited computing power TD algorithm could learn quicker and if you finish the learning process with a q algorithm it might have some cornerstones to find out the best way in a better manner
@Zohbie Před 6 lety ⁺²
0:50 For that joke you really deserve a subscribtion! :D
@grantstenger6182 Před 6 lety
Why is the q_table initialized as np.zeros((n_states, n_states, 3))? 3 is the number of actions, right (i.e. drive left, drive right, do nothing)? Why would we need two dimensions for the number of states?
@TummalaAnvesh Před 6 lety ⁺¹
Very clean explanation and summary. Definitely a great quality improvement in your reinforcement.
@daggawagga Před 6 lety
What kinds of approaches can you take when there isn't an obvious reward metric to feed to your algorithm? Let's say you wanted to make an AI to begin and finish a game that doesn't seem very linear such as Zelda or Metroid, or an analogous but unknown game. Do you just cram as many item counters as you can for measuring rewards?
@boffo25 Před 6 lety ⁺²
What were you trying to cook with q learning?
@SirajRaval Před 6 lety ⁺¹
haha needed my green screen
@TheCrashman16 Před 6 lety
Thanks for the videos man. However it seems that the Q Matrix cannot be used with a large number of states.
@brianmvukwe5506 Před 5 měsíci
This is top tier content man. Thank you so much!
@sarangs8441 Před 6 lety ⁺⁷
Can you make a video on setting up python with all its libraries needed for your videos. I am having a hard time knowing what all libs I need for your older videos.
Which version python do you use: 32bit or 64bit.
@VictorGallagherCarvings Před 6 lety ⁺⁶
You need to check out the youtube channel 'sendex'. Also I am 99% confident that he is using the 64bit version.
@sarangs8441 Před 6 lety ⁺¹
Victor Gallagher thanks a lot
@bauwndule Před 6 lety ⁺²
senTdex right?
@VictorGallagherCarvings Před 6 lety
Sorry, yes sentex is right.
@VictorGallagherCarvings Před 6 lety ⁺¹
got it wrong again, 'sentdex'
@eagleswildcard Před 6 lety
Great work man
@Nik-dz1yc Před 4 lety
this was pretty decent but you shouldnt have titled the video the way you did because i would not consider it that beginner friendly
@matthewdaly8879 Před 6 lety ⁺¹
Does this mean that the player has to have already been in a state and taken some actions to make an optimal decisions, or is there a technique to use past results to estimate future rewards i.e. a neural network? With the state consisting of two different variables in this case, it seems like to would take a while for the car to find the best actions to take for each occurring state in a reasonable time. I'm a little confused.
@dustinandrews89019 Před 6 lety ⁺²
"a technique to use past results to estimate future rewards i.e. a neural network", yes. Q learning is exactly that. Start with a untrained agent that knows nothing of the environment. Also strongly bias that agent to random actions at first in order to gather data. Next, allow the agent to take some number of actions in the environment, while recording the entire session. At the end take the reward, in this case it could be "units to the right of the start." No go back over the recording and apply a share of that score to every move (perhaps with a decay for the older actions.) Finally, feed each instance of the replay, one at a time, into the network. You have to provide the state of the world + action and train it towards the score. If you keep doing this your network will start to converge on an understanding of what moves will create what score. Once you gain some data stop purely randomly sampling actions. Start using predictions from your model to inform the next move (the rate at which you go from pure random to pure agent is an important hyper-parameter.) If all goes well your agent learns better and better actions to take in each situation until it knows how to get very good scores. At least, that's how it should work! I'm struggling to get my model to converge on a similar toy example.
@SirajRaval Před 6 lety ⁺²
what dustin said
@matthewdaly8879 Před 6 lety ⁺²
Thanks
@dustinandrews89019 Před 6 lety ⁺¹
Well that made my day. Thanks Siraj!
@cheungtyrone3615 Před 4 lety
This explains the algorithm neatly and vividly. I happened to encounter Q learning when reading a paper and I had been consulting blogs and posts on this for quite a while but I always felt like something is missing out. Actually, Bellman equation is the only ingredient that matters in this recipe, but with "rigorously" formated text only, it can be hard to figure out what it is doing.
@johnmelendez8829 Před 5 lety ⁺⁸
"So easy, a liberal art major can do it " lmaooo🤣🤣🤣🤣🤣🤣
@Madlion Před 6 lety
Is model here referred to model of the environment/world?
@aniseedus Před 6 lety
Must the reward be only discrete 1 or 0? Can it be an intermediate fraction or decimal?
@diegoosorio7752 Před 3 lety
Great video!
@messiklauf928 Před 4 lety ⁺¹
so easy a liberal arts major can do it - "Oh thats about me" moves closer to screen, sits straight
@joakim69 Před 5 lety ⁺¹
You are damn right, I am a beautiful wizard!
@RendallRen Před 6 lety
I didn't get the 'liberal arts major' reference. Who is the bearded man in the inset at 0:52?
@hardikajmani5088 Před 6 lety
Explaination 👌🏻🔥
@ronstubed Před 6 lety
How do we check the convergence of Q matrix?
@yogeshsaini5039 Před 6 lety
great work sir
@vikramb183 Před 6 lety
I'm confused about how this is machine learning. It seems to me that the computer just creates a lookup table. Could you please clarify, for I'm sure I am missing something?
@mahirgulzar5403 Před 6 lety
Well he only talked about finding the optimal policy.. But not to forget you have to generalize on the optimal policy i.e you drop that agent in an environment which it doesn't know about or let me put it this way.. Suppose your agent has learned the policy that whenever the distance between the agent and an obstacle is precisely 5 meter it should apply brakes but heres the catch.. The state space is not discrete always it can be continuous. In-fact in real life it is continuous. So machine learning comes to play at this point.. It takes the state vector and applies function approximation (can be a neural network) to spit out an action on it.. Hope that helps.. :)
PS: ML is nothing but function approximation or curve fitting..
@vikramb183 Před 6 lety
Thanks! I think I understand now.
@philrowlands1087 Před rokem
You are brilliant. I only dream of having your ease of understanding of these processes!🎉
@Kipsterbro Před 6 lety
Hey! i am aiming to create an Ai, i was going to use a genetic algorithm.
What do you think the best type of algorithm would be for creating a bipedal balancing/walking robot?
i was thinking of using unity to simulate the physics
Thanks
@aey2579 Před 4 lety
This guy is too smart for me. I need someone who is on my low IQ to explain this.
@Foxhood Před 4 lety ⁺¹
I would suggest to look up Code Bullet. He does fun AI stuff in an easier to comprehend, sillier manner.
A.I Learns to DRIVE does Q-Learning and its bigger brother Deep Q network.
Not as in-depth, but fun to see and gives a good starting idea of what the AI does.
@chicken6180 Před 6 lety
can anyone explain what the value of n_states represents in the program? does this mean there are only 40 possible positions in the environment or what? thanks in advance
edit - i'm thinking that the "3" in "q_table = np.zeros((n_states, n_states, 3))" represents the fact that there are 3 possible actions for the car? i'm confused.
@manjaecho5909 Před 6 lety
Great video! Thx
@Perryman1138 Před 6 lety
One thing I’ve wondered is how to tackle AI for tasks that can’t be parallelized or sped up (I.e. model-free, real time task based AI)
@Yannoux2000 Před 6 lety
Perryman1138 record some example data so you would pre train your model. if i do remember correctly it s called imitation learning. by giving the agent some interessing actions path with aready computed rewards. the more the better.
@Perryman1138 Před 6 lety
Ah I see. In particular, I was thinking of Dungeon Crawl Stone Soup, a color-graphics terminal roguelike for which recordings of thousands of games exist in text format, but I believe it might only record the game outputs, not the player inputs. Still, a fascinating concept! Thanks!
@crazyoldhippieguy Před 4 lety
So is python the laugage for deep learning?
@IrateMoogle Před 6 lety
You should have many more followers
@Raj_Patel21 Před 6 lety
I am new here and dont know from where to start learning this stuff. Any suggestions
@arafatullahturjoy5380 Před 5 lety
Can Q_LEARNING be used for solving classification problem? If it does then how? Could you explain or make a video regarding this?@Siraj Raval
@st101k Před 6 lety ⁺¹
As an always very informative and perfect ☺
@souravjamwal77 Před 6 lety ⁺¹
From where should I start learning AI and Machine Learning
Pls help me guys
I am a beginner and I know Python programming
@davidrey6126 Před 6 lety
First learn calculus 1 and 2, if possible learn calculus 3. After calculus 2 you can start learning probability and statistics. Make sure to learn the full details for statistics, not just basic stuff like normal distributions. Learn many different well known distributions such as chi square, gamma, etc. Learn how to do inferential statistics such as point estimation, interval estimation. Once you spend 1 to 2 years learning the Math and also at the same time brush up your python and R skills for data science packages. Finally you may start looking into machine learning techniques. But first start with good ol linear regression and logistic regression (more math + linear algebra). And then learn statistical learning methods that have been rebranded into machine learning methods from 50 to 100 years ago. Once you're done with that you can start learning some more modern methods like reinforcement learning and neural networks. There is no ONE, BEST, Tool in machine learning. Every method is suitable for different cases. But yes, neural networks are cool so everyone talks about it.
@GregorianHunter Před 4 lety
q-learning is model-free learning, not model-based learning just FYI
@marcel2711 Před 3 lety
why anyone talking about reinforcement learning does this only in python. I wanna see examples in c++, in c#, in java.. why only python?
@bezelyesevenordek Před 3 lety
dude just get the point, you can implement it in any programming language. and converting python code to c# is easy. c# has all the things python has. maybe lots of lines but, it's easy to do.

Další v pořadí

Automatické přehrávání

A friendly introduction to deep reinforcement learning, Q-networks and policy gradients