Deep Q Learning is Simple with PyTorch | Full Tutorial 2020

Machine Learning with Phil

zhlédnutí 103 906

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 11. 09. 2024

Komentáře • 130

@MachineLearningwithPhil Před 4 lety ⁺¹⁷
This content is sponsored by my Udemy courses. Level up your skills by learning to turn papers into code. See the links in the description.
@PhilippDominicSiedler Před 3 lety ⁺²
Thank you very much for your content! I can't seem to find "from paper to code" course on Udemy in the description neither directly on your profile on Udemy. Is this not out yet?
@MachineLearningwithPhil Před 3 lety ⁺¹
Deep Q Learning:
www.udemy.com/course/deep-q-learning-from-paper-to-code/?couponCode=DQN-AUG-2021
Actor Critic Methods:
www.udemy.com/course/actor-critic-methods-from-paper-to-code-with-pytorch/?couponCode=AC-AUG-2021
Natural Language Processing from First Principles:
www.udemy.com/course/natural-language-processing-from-first-principles/?couponCode=NLP1-AUG-2021
@vijeta268 Před 4 lety ⁺⁸
Thanks A LOT for making this tutorial!
Coming from a non-CS background, coding is always a bottleneck for me but this video helped pass that phase with ease.
@eliasebner3595 Před 3 lety ⁺³
this guy is so good he doesn't even need autocomplete.
@GaetanoFavoino Před 4 lety ⁺⁶
Thank you for your clean tutorials, hope you'll make a new one on non-stationary environments soon.
@shashisuman8302 Před 4 lety ⁺⁹
Please don't tell people "you don't need any exposure fo deep learning etc". This is why people jump from projects to projects without understanding as they get excited.
@kontra_21 Před 4 lety
In fairness, you don't need exposure to deep learning in order to follow this tutorial. However I can agree it may have been a little misleading as people may have assumed this was a top-down easy-to-digest intro video where it would all be explained in simple terms.
@nathanas64 Před 3 lety ⁺¹
Exceptionally clear presentation!! Pure genius! Will definitely take the course
@abhijiths2918 Před 3 lety ⁺⁶
Good Tutorial. But man if you could just open your mouth when you speak! I had to enable subtitles just to understand what you're saying, and half the time subtitles were wrong because it can't understand what you're saying!
@alirezamogharabi8733 Před 4 lety ⁺⁴
Thanks a lot Dr. Phil, please make some videos about multi agent reinforcement learning ❤️❤️🌹🌹
@MachineLearningwithPhil Před 4 lety ⁺¹
Great suggestion but will take some time.
@alirezamogharabi8733 Před 4 lety
@@MachineLearningwithPhil Thank you so much for your attention 🙏🙏❤️🌹
@walterjonathan8947 Před 3 dny
Hello Phil, I could not find the repo, please direct me where to find it
@ahmedgamberli2250 Před 2 lety ⁺²
Thanks for making this tutorial. I just have a tiny question. Why do we q_nex[terminal_batch] = 0.0? Question may be a bit stupid. Sorry, for being a newbie :)
@MachineLearningwithPhil Před 2 lety
The terminal state has no future value, because no future rewards follow it.
@burdescualexandru Před 4 lety ⁺²
Hey Phil! I'm looking forward to see a video where you show us how to define our own enviroment ! All the tutorials around are using gym, but i'd like to try reinforcement learning on some personal projects !
@MachineLearningwithPhil Před 4 lety
Start here
czcams.com/video/vmrqpHldAQ0/video.html
@TIM6266 Před 2 lety
This is my master's degree savior.
@kontra_21 Před 4 lety ⁺⁵
I really appreciate this simple agent walkthrough. I find it easy to digest compared to other courses I've seen, and doesn't try to explain the math behind it TOO much, which for novices is pretty nice.
My concern though is that because our agent is learning every step of every episode, it is also decaying epsilon every step as well. This leads to a much more rapid and unpredictable descent of epsilon (due to each episode having varying number of steps) for the lifetime of the agent vs other agents I have seen. (Full decay by episode 15-25)
Is this intentional? If so, is there any way you could elaborate on why we would want epsilon to be fully decayed within 5% of the Agents training time?
@MachineLearningwithPhil Před 4 lety ⁺²
Good question. It turns out that the epsilon decay schedule isn't super critical to learning, at least from my experience. You can get away with a rapid decay as long as epsilon is left sufficiently large as to allow exploration. If it were going all the way to zero(which you should never do unless you want to evaluate performance) then such an aggressive schedule would be a problem for sure.
@sergiosanchez6377 Před 6 měsíci
hi Phil! Is this code running in the current version of torch and gym? thank u for you work on this video!
@thelaconicguy Před 4 lety ⁺²
Hey Phil!, You are a class apart from others in explaining all these topics. I have a request for you. Since, reinforcement learning takes a lot of time when implemented on real world problems. Isn't it good to move your videos towards new techniques like 'Imitation learning', 'GANs' etc ?
@user-gr1cg5ep5j Před 4 lety
Thanks a lot Phil,I am your big fan,by the way
Can u make some video about ppo and imitation learning
@noamabadi6482 Před 2 lety
Hi! Why do you use Q_eval.forward(state) instead of Q_eval(state)? I read that it's not good because the hooks aren't deployed, although I have no clue what hooks are.
Thanks for the tutorial!
@user-kc8qb8qf7r Před rokem
The video is very good, I hope there will be a version with Chinese subtitles
@padraopv Před rokem
Thank you for this amazing content, Phil!
@9841580948 Před 4 lety ⁺¹
How can we save DeepQ Model after full episodes of training? Thank you
@happyduck70 Před 2 lety
A question: Is it really needed to make the terminal_batch a Tensor? Since null the q-values for terminal states on q_next, you could also use a np.array? is that correct?
@haneulkim4902 Před 3 lety
Thanks Phil for an amazing tutorial!
@pratheeps3972 Před 4 lety ⁺²
Amazing and perfect timing too. I was looking at your older code for my project and you just gave the better version. My only issue is that my environment returns a matrix(image). How do I modify your code to get it to work?
@chunchunmaru3644 Před 3 lety
Make the output the shape of an image
@juleswombat5309 Před 2 lety ⁺¹
Sounds as though you need pytorch Convolutional layers at the front end of the Q neural network, if you have image, video based inputs. I suspect you may need to stack a few observations together, if you expect to detect motion from video.
@yemiyesufu5745 Před 4 lety ⁺²
is the udemy course done with pytorch or tensorflow?
@MachineLearningwithPhil Před 4 lety ⁺³
Pytorch
@Salehalanazi-7 Před 4 lety ⁺¹
Genius. Appreciate you 💜
@masoncoles402 Před 2 lety
Hey, how would I go about saving/loading this model?I adapted your network for a different game
@abolfazlzakeri6822 Před 3 lety ⁺¹
Very well. Thank you.
@RabeeQasem Před 3 lety ⁺¹
is there a possibility to do a tutorial on multi-agent DQN ?
I know there is a tutorial on A3C but in some cases, DQN is more suitable for gird wards environment more thatn a3c
@MachineLearningwithPhil Před 3 lety
Thanks for the great suggestion Rabee. I'll add it to the list!
@0hunnaa74 Před 2 lety
is it working??? the result is different from yours
I got an avg from -300 ~ -500
do other people run well??
@Lolnigaaaaaaaaa Před 2 lety
How do you auto complete the command without actually typing it completely ?
@sounakmojumder5689 Před 4 měsíci
Hi thank you I just have a request, if you can do this in colab , actually loading and saving model part in colab is bit messy , you can guide us
@trenvert123 Před 2 lety
Thank you for this tutorial!
@nonago725 Před 7 měsíci
the line "self.state_memory[index] = state" in the store_transition() function is giving "ValueError: setting an array element with a sequence. The requested array would exceed the maximum number of dimension of 1."
my code didn't work and then i copy-and-pasted your code, it's still getting the same error. why is this?
@MachineLearningwithPhil Před 7 měsíci
Because the latest version of gym changed the interface. Reset now returns observation and info, and step returns observation, reward, done, truncated, info.
@nonago725 Před 7 měsíci
@@MachineLearningwithPhil ah, okay. i changed the env line to "observation, _ = env.reset()" and everything works now. thank you
@ahmetfurkanaknc8959 Před 3 lety
Thanks, excellent tutorial !
@MrEvilyogurt Před 3 lety
does anyone have issues trying to load checkpoints after training? when i load the checkpoints my graph doesnt properly plot. It keeps a score of -21 at all episodes
@anthonysu71 Před 4 lety ⁺²
Hi, Dr. Phil. Great work for a deep Q network implementation and demo. I have been following your tutorial for a while. I am recently doing a DQN for a "multi-agent" collection, which means there is more than 1 agent in the system but we consider them all as a collection. Correspondingly, state(agent1, agent2,...)and action(action1, action2, ...) as collections are used to describe this collection. But the trick is we don't know the number of agents for sure, which gives me a hard time describing n_actions(if 1 agent has 8 actions, 2 would have 64). Does the DQN framework still apply here? If it does, is it possible for you to give me some suggestions about how to modify this framework? Thanks in advance!!!
@CustomDabber360 Před 2 lety
Amazing! I love your video.
@rahuldhanasiri Před 3 lety ⁺¹
Thankyou Dr. Phil for an amazing video. When I try to run this on colab, I get this error : "expected scalar type Float but found Double" at either 18th line or 23rd line of main**.py. I am trying it on cartpole environment and I have also tried to change the observation(line 16) to float 32 but it didn't work.
@SenselessTalk Před 3 lety ⁺¹
btw, to answer this:
def forward(self, state):
state = state.to(torch.float32)
x = F.relu(self.fc1(state))
x = F.relu(self.fc2(x))
actions = self.fc3(x)
return actions
@mathmo Před 4 lety
Hi Phil, any reason you are using the forward() method on your neural net instead of calling it directly As Q_eval() I.e. using __call__()? I believe in general calling forward() is unsafe, since there’s potentially some necessary magic involving hooks going on under the surface that you might miss.
@MachineLearningwithPhil Před 4 lety ⁺¹
I was unaware, thanks for the heads up. I'll just use the call from now on.
@alisyedj Před 2 lety
Thank you, Prof Phil. Very helpful! Can you expand on what Target networks do? I was reading the paper "Human - level control through deep reinforcement learning" where it talks about target network. Not clear what it is and what are the advantages of creating it. Thank you in advance
@MachineLearningwithPhil Před 2 lety
Target networks help to stabilize training. Using the same network to generate data and evaluate data each time step results in chasing a moving target. The target network changes more slowly so it's a more stable Target
@jose-alberto-salazar-jimenez Před 3 měsíci
I have a question... Say, one trains a model, and save its model state for later use... How would one go about loading the model state and performing testing of the agent?.... I've tried coding something (following what I've found on the internet, being, in a nutshell, loading the model state, changing it to eval model, then with torch no grad, selecting the actions greedily), which during training does pretty well at the end of its training (learning was expected), but when I try testing (for instance, to show others its performance), it performs horribly... can anybody help me?
@MachineLearningwithPhil Před 2 měsíci
Are you sure you're loading the best parameters?
@2ndgenfsdbetatester315 Před 2 lety
life-saving video
@mT4945 Před 4 lety
Hi Phill,
I just found your channel and I really like your content.
Do you think reinforcement learning future compared to text mining and image recognition?
@MachineLearningwithPhil Před 4 lety
I think we'll see more applications of RL to those other fields. None of them will get us close to AGI.
@miriamramstudio3982 Před 4 lety
Hi Phil, is it correct that epsilon already reaches the eps_min of 0.01 after only 11 episodes ? Does it mean that we have almost no exploration anymore after 11 episodes ?
@MachineLearningwithPhil Před 4 lety ⁺¹
Mostly correct. Only 1% of actions will be exploratory but that's sufficient for learning.
@yigitsevim7741 Před rokem
great tutorial, thanks. as a small criticism, please slightly move away from the microphone when coughing.
@MachineLearningwithPhil Před rokem
Sorry about that.
@jjschnyder Před 3 lety
very nice tutorial. why do you make a memory-array for every element ( state, new state, reward etc..), couldnt you just make one overally memory array and store named Tuples in the form (state, action, reward, newstate, done) ?
@MachineLearningwithPhil Před 3 lety
Yup, that's another way to do it. I use the named arrays because it's easier(for me) to keep track of where everything is stored.
@ImDadidu Před 3 lety
Great video! Helped me a lot with my bachelor thesis. I'm working on a private project now where the agent needs to predict a x_action between -1.0 and 1.0 and a y_action between -1.0 and 1.0. How can I manage the action indices in the learn()-method if I have multiple floats which describe one action? Or do I need a completely differen model for that? Thanks in advance :)
@MachineLearningwithPhil Před 3 lety
If it's a continuous action try something like DDPG or TD3
@mickpress6718 Před 4 lety
Hi Phil. Just found this channel, nice :) I may be wrong, but i think there may be a problem in the learn process, mem_counter is never reset, so once its hit batchsize it will learn every time the learn function is called.
@MachineLearningwithPhil Před 4 lety
Nope, functioning as intended.
@kontra_21 Před 4 lety ⁺¹
That is intended. As he explain in the course, this is because at first there is no information in the state memories due to having just been initialized. So we need the agent to run through X amount of games (where X is your batch size) at a minimum before the agent can start to properly learn. After that it's never supposed to stop learning :)
@hossein_haeri Před 3 lety
Why did you set the epsion to 1?
@billallen9251 Před rokem
I followed and built the tensorflow 2 version of this yesterday and it ran great. I haven't been able to get the pytorch version to ever get above 0. I've scoured the code looking for bugs, I've tried every combination of hyper parameters. Has something changed in pytorch that needs to be reflected in this code. My version is 1.13.1.
@MachineLearningwithPhil Před rokem
Not that I'm aware of. Shoot me an email with a link to your GitHub. phil@neuralnet.ai
@n00bxl71 Před rokem
I tried implementing this, I implemented it exactly, but it just gets worse and worse. It's hovering at around -500 average score, it seems to just press as many buttons as possible and stay up in the air as soon as epsilon reaches minimum. Any thoughts?
@MachineLearningwithPhil Před rokem
Are you decaying epsilon and over time?
@n00bxl71 Před rokem
Not entirely sure what you were trying to say, but yes, epsilon is decreasing over time.
@n00bxl71 Před rokem
Could you tell me what version each library is supposed to be at, so that I can better recreate your setup?
@mgr1282 Před 4 lety
Hi Mr Phil, I have some issues with your code in the previous video with tf2. I used it for CartPole-v0 and FrozenLake-v0 of gym. for cartpole it did very well but for frozenlake was very very weak. I don't know why.
BTW, in your code, in the body of build_dqn function, you didn't use input_dims; why?
@MachineLearningwithPhil Před 4 lety
Regarding the input dims, they're inferred with Keras.
Define poor performance for frozen lake? In my course we get 70% win rate using regular Q learning.
@mgr1282 Před 4 lety
@@MachineLearningwithPhil In which of your courses? I've got 70% win rate without neural network. I've expected much more with your tf2 code in the previous video but got under 10%. It was great for cartpole.
@MachineLearningwithPhil Před 4 lety
Why do we use neural networks? What are their use cases and limitations?
@mgr1282 Před 4 lety
@@MachineLearningwithPhil I don't know exactly, I'm a beginner in reinforcement learning. I expected it could help our agent to learn better. Deep neural networks needs a lot of data.I know it is one of their limitations.
@MachineLearningwithPhil Před 4 lety ⁺¹
Neural nets are designed to work for large / continuous state spaces. They don't handle the small discrete ones very well. Tabular Q learning is far better suited for an environment like the frozen lake.
@hackathonhacks4119 Před 3 lety
ValueError: maximum supported dimension for an ndarray is 32, found 10000 ... from writing all code from here. What might be the issue here ?
@MachineLearningwithPhil Před 3 lety
What version of numpy?
@ΧρήστοςΠαλάσκας-π4ν Před 5 měsíci
Nice!
@mehuljan26 Před 3 lety
Love your videos. I have a question though, if i want to implement the same code on games with pixel as observation space, how do i do that? I am getting multiple errors while trying to implement breakout-V2
@MachineLearningwithPhil Před 3 lety ⁺¹
See my video "ai learns to play pong"
@FoxGameing148 Před 4 lety
thank for the help
@gabrielvalentim197 Před rokem
Hey Phil, how can I solve local minimum problems in PPO?
I try to solve Luna Lander with PPO agent (with and without bonus entropy) but my agent stop in local minimum.
I really appreciate your videos and I using them to improve my skills!!
Tkss!!
@MachineLearningwithPhil Před rokem ⁺¹
Yeah it needs some hyper parameter tuning. Shoot me an email and I'll help you sort it out.
@gabrielvalentim197 Před rokem
@@MachineLearningwithPhil Ok, thank you Phil!
@bradduy7329 Před 3 lety
can you explain that why we don't need call forward function in DeepQNetwork?
E.g: def forward()
forward()
@marcoss147 Před 3 lety
Pytorch takes care of calling the function. If you call it anything other than forward it won't work. You should check the pytorch docs if you want to learn more
@IsaacPFranco Před 4 lety
wondering how you got pytorch to recognize np.bool for self.terminal_memory, brought up an error for me. I had to change dtype to np.uint8
@MachineLearningwithPhil Před 4 lety ⁺¹
Older versions of PyTorch used np.uint8. The newer (1.4) version requires np.bool and throws an error with np.uint8
@andreamassacci7942 Před 4 lety
Nice video. Well explained.
@MachineLearningwithPhil Před 4 lety
Glad it helped
@nandans2506 Před 4 lety
Great content
@haneulkim4902 Před 3 lety
eps_dec = 5e-4 and each time learning happens it substract current epsilon by eps_dec, so starting from 1 it should output epsilon 1, 0.9995, 0.9990, 0.9985, etc... This is not true when I run main_py for lunar_lander. Why is that so? it shrinks like follows 0.99, 0.95, 0.89, 0.84, etc... seems like it decrease by 0.05.
@MachineLearningwithPhil Před 3 lety ⁺¹
The decrement happens each time step; the print is at the end of every episode.
@haneulkim4902 Před 3 lety
@@MachineLearningwithPhil Oh hahah my bad, thanks Phil!
@jasonpeloquin9950 Před rokem
This video is very helpful. Did something change with the store_transition function? I am getting an array mismatch saying the requested array would exceed the maximum number of dimension of 1
@MachineLearningwithPhil Před rokem
If you're using the latest version of gym, the API has changed. Reset returns observation and info and the step function returns observation, reward, done, truncated, info
@jasonpeloquin9950 Před rokem
ah, can you just take the first argument of observation now with the new api? Also, I just bought your course, this tutorial was very helpful
@MachineLearningwithPhil Před rokem
Yup, you can discard the debug info.
@saifal-wahaibi6448 Před rokem
Hey, how did you resolve the error?
@jasonpeloquin9950 Před rokem
@@saifal-wahaibi6448 you can just take the first element of that output. I can’t remember if I did it by indexing or doing .item
@kutilkol Před 2 lety ⁺¹
dude, start using some ide from this millennium omg
@SalvatorePellitteri Před 3 lety ⁺¹
Next time use font size 22 at least.
@Penguin134 Před 3 lety
How did you know to use [8] as input dims?
@MachineLearningwithPhil Před 3 lety
It's the default for lunar lander.
@qhieu195 Před 4 lety
Great tutorial!
Can you make a video that builds a DQN from scratch using Numpy?
@spinity8468 Před 4 lety
I thought Q and Q* use a different NN, but it seems not the case here. Am I wrong?
@MachineLearningwithPhil Před 4 lety
I omit the use of the target network in this tutorial. Hence the "simple" part of the title. It's intended to be the simplest implementation that actually works in a non trivial sense.
@spinity8468 Před 4 lety
@@MachineLearningwithPhil You did a nice job! I am wondering if you have a similar video using two different networks for Q and Q*. Do you have such thing?
@MachineLearningwithPhil Před 4 lety
czcams.com/video/a5XbO5Qgy5w/video.html
@spinity8468 Před 4 lety
@@MachineLearningwithPhil I am not familiar at all with Keras or Tensorflow. Do you have the equivalent with Pytorch?
@MachineLearningwithPhil Před 4 lety
If you check out my github (linked in description), the repo for my course is there. You can see the PyTorch equivalent.
@patrickphillips7009 Před 4 lety
At 33:54 "is our children learn..., is our agent learning" funny
@alexandrefournier-ahizoune8098 Před 2 lety
what does "fc1" stands for ?
@MachineLearningwithPhil Před 2 lety ⁺¹
First fully connected layer
@alexandrefournier-ahizoune8098 Před 2 lety
@@MachineLearningwithPhil thanks
@abrahamloha3050 Před 2 lety
best
@shivg2519 Před 4 lety
nice
@emanuelepapucci59 Před 2 lety
Finally here I see for the first time the fucking plotLearning function ... god... I don't know how many videos I saw without know what that function was and why I couldn't use it ... now finally I know ... you made it ... next time remember to put ALWAYS a refer link under your videos regarding functions that you use and are not inside the packages... otherwise is no sense follow your tutorial... I'm saying this to you for the next time, because I'm a beginner and I can't understand that a function is not inside a package or less, if you don't explain it...

Další v pořadí

Automatické přehrávání