Deep Q Learning is Simple with PyTorch | Full Tutorial 2020

Sdílet
Vložit
  • čas přidán 11. 09. 2024

Komentáře • 130

  • @MachineLearningwithPhil
    @MachineLearningwithPhil  Před 4 lety +17

    This content is sponsored by my Udemy courses. Level up your skills by learning to turn papers into code. See the links in the description.

    • @PhilippDominicSiedler
      @PhilippDominicSiedler Před 3 lety +2

      Thank you very much for your content! I can't seem to find "from paper to code" course on Udemy in the description neither directly on your profile on Udemy. Is this not out yet?

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 3 lety +1

      Deep Q Learning:
      www.udemy.com/course/deep-q-learning-from-paper-to-code/?couponCode=DQN-AUG-2021
      Actor Critic Methods:
      www.udemy.com/course/actor-critic-methods-from-paper-to-code-with-pytorch/?couponCode=AC-AUG-2021
      Natural Language Processing from First Principles:
      www.udemy.com/course/natural-language-processing-from-first-principles/?couponCode=NLP1-AUG-2021

  • @vijeta268
    @vijeta268 Před 4 lety +8

    Thanks A LOT for making this tutorial!
    Coming from a non-CS background, coding is always a bottleneck for me but this video helped pass that phase with ease.

  • @eliasebner3595
    @eliasebner3595 Před 3 lety +3

    this guy is so good he doesn't even need autocomplete.

  • @GaetanoFavoino
    @GaetanoFavoino Před 4 lety +6

    Thank you for your clean tutorials, hope you'll make a new one on non-stationary environments soon.

  • @shashisuman8302
    @shashisuman8302 Před 4 lety +9

    Please don't tell people "you don't need any exposure fo deep learning etc". This is why people jump from projects to projects without understanding as they get excited.

    • @kontra_21
      @kontra_21 Před 4 lety

      In fairness, you don't need exposure to deep learning in order to follow this tutorial. However I can agree it may have been a little misleading as people may have assumed this was a top-down easy-to-digest intro video where it would all be explained in simple terms.

  • @nathanas64
    @nathanas64 Před 3 lety +1

    Exceptionally clear presentation!! Pure genius! Will definitely take the course

  • @abhijiths2918
    @abhijiths2918 Před 3 lety +6

    Good Tutorial. But man if you could just open your mouth when you speak! I had to enable subtitles just to understand what you're saying, and half the time subtitles were wrong because it can't understand what you're saying!

  • @alirezamogharabi8733
    @alirezamogharabi8733 Před 4 lety +4

    Thanks a lot Dr. Phil, please make some videos about multi agent reinforcement learning ❤️❤️🌹🌹

  • @walterjonathan8947
    @walterjonathan8947 Před 3 dny

    Hello Phil, I could not find the repo, please direct me where to find it

  • @ahmedgamberli2250
    @ahmedgamberli2250 Před 2 lety +2

    Thanks for making this tutorial. I just have a tiny question. Why do we q_nex[terminal_batch] = 0.0? Question may be a bit stupid. Sorry, for being a newbie :)

  • @burdescualexandru
    @burdescualexandru Před 4 lety +2

    Hey Phil! I'm looking forward to see a video where you show us how to define our own enviroment ! All the tutorials around are using gym, but i'd like to try reinforcement learning on some personal projects !

  • @TIM6266
    @TIM6266 Před 2 lety

    This is my master's degree savior.

  • @kontra_21
    @kontra_21 Před 4 lety +5

    I really appreciate this simple agent walkthrough. I find it easy to digest compared to other courses I've seen, and doesn't try to explain the math behind it TOO much, which for novices is pretty nice.
    My concern though is that because our agent is learning every step of every episode, it is also decaying epsilon every step as well. This leads to a much more rapid and unpredictable descent of epsilon (due to each episode having varying number of steps) for the lifetime of the agent vs other agents I have seen. (Full decay by episode 15-25)
    Is this intentional? If so, is there any way you could elaborate on why we would want epsilon to be fully decayed within 5% of the Agents training time?

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 4 lety +2

      Good question. It turns out that the epsilon decay schedule isn't super critical to learning, at least from my experience. You can get away with a rapid decay as long as epsilon is left sufficiently large as to allow exploration. If it were going all the way to zero(which you should never do unless you want to evaluate performance) then such an aggressive schedule would be a problem for sure.

  • @sergiosanchez6377
    @sergiosanchez6377 Před 6 měsíci

    hi Phil! Is this code running in the current version of torch and gym? thank u for you work on this video!

  • @thelaconicguy
    @thelaconicguy Před 4 lety +2

    Hey Phil!, You are a class apart from others in explaining all these topics. I have a request for you. Since, reinforcement learning takes a lot of time when implemented on real world problems. Isn't it good to move your videos towards new techniques like 'Imitation learning', 'GANs' etc ?

    • @user-gr1cg5ep5j
      @user-gr1cg5ep5j Před 4 lety

      Thanks a lot Phil,I am your big fan,by the way
      Can u make some video about ppo and imitation learning

  • @noamabadi6482
    @noamabadi6482 Před 2 lety

    Hi! Why do you use Q_eval.forward(state) instead of Q_eval(state)? I read that it's not good because the hooks aren't deployed, although I have no clue what hooks are.
    Thanks for the tutorial!

  • @user-kc8qb8qf7r
    @user-kc8qb8qf7r Před rokem

    The video is very good, I hope there will be a version with Chinese subtitles

  • @padraopv
    @padraopv Před rokem

    Thank you for this amazing content, Phil!

  • @9841580948
    @9841580948 Před 4 lety +1

    How can we save DeepQ Model after full episodes of training? Thank you

  • @happyduck70
    @happyduck70 Před 2 lety

    A question: Is it really needed to make the terminal_batch a Tensor? Since null the q-values for terminal states on q_next, you could also use a np.array? is that correct?

  • @haneulkim4902
    @haneulkim4902 Před 3 lety

    Thanks Phil for an amazing tutorial!

  • @pratheeps3972
    @pratheeps3972 Před 4 lety +2

    Amazing and perfect timing too. I was looking at your older code for my project and you just gave the better version. My only issue is that my environment returns a matrix(image). How do I modify your code to get it to work?

    • @chunchunmaru3644
      @chunchunmaru3644 Před 3 lety

      Make the output the shape of an image

    • @juleswombat5309
      @juleswombat5309 Před 2 lety +1

      Sounds as though you need pytorch Convolutional layers at the front end of the Q neural network, if you have image, video based inputs. I suspect you may need to stack a few observations together, if you expect to detect motion from video.

  • @yemiyesufu5745
    @yemiyesufu5745 Před 4 lety +2

    is the udemy course done with pytorch or tensorflow?

  • @Salehalanazi-7
    @Salehalanazi-7 Před 4 lety +1

    Genius. Appreciate you 💜

  • @masoncoles402
    @masoncoles402 Před 2 lety

    Hey, how would I go about saving/loading this model?I adapted your network for a different game

  • @abolfazlzakeri6822
    @abolfazlzakeri6822 Před 3 lety +1

    Very well. Thank you.

  • @RabeeQasem
    @RabeeQasem Před 3 lety +1

    is there a possibility to do a tutorial on multi-agent DQN ?
    I know there is a tutorial on A3C but in some cases, DQN is more suitable for gird wards environment more thatn a3c

  • @0hunnaa74
    @0hunnaa74 Před 2 lety

    is it working??? the result is different from yours
    I got an avg from -300 ~ -500
    do other people run well??

  • @Lolnigaaaaaaaaa
    @Lolnigaaaaaaaaa Před 2 lety

    How do you auto complete the command without actually typing it completely ?

  • @sounakmojumder5689
    @sounakmojumder5689 Před 4 měsíci

    Hi thank you I just have a request, if you can do this in colab , actually loading and saving model part in colab is bit messy , you can guide us

  • @trenvert123
    @trenvert123 Před 2 lety

    Thank you for this tutorial!

  • @nonago725
    @nonago725 Před 7 měsíci

    the line "self.state_memory[index] = state" in the store_transition() function is giving "ValueError: setting an array element with a sequence. The requested array would exceed the maximum number of dimension of 1."
    my code didn't work and then i copy-and-pasted your code, it's still getting the same error. why is this?

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 7 měsíci

      Because the latest version of gym changed the interface. Reset now returns observation and info, and step returns observation, reward, done, truncated, info.

    • @nonago725
      @nonago725 Před 7 měsíci

      @@MachineLearningwithPhil ah, okay. i changed the env line to "observation, _ = env.reset()" and everything works now. thank you

  • @ahmetfurkanaknc8959
    @ahmetfurkanaknc8959 Před 3 lety

    Thanks, excellent tutorial !

  • @MrEvilyogurt
    @MrEvilyogurt Před 3 lety

    does anyone have issues trying to load checkpoints after training? when i load the checkpoints my graph doesnt properly plot. It keeps a score of -21 at all episodes

  • @anthonysu71
    @anthonysu71 Před 4 lety +2

    Hi, Dr. Phil. Great work for a deep Q network implementation and demo. I have been following your tutorial for a while. I am recently doing a DQN for a "multi-agent" collection, which means there is more than 1 agent in the system but we consider them all as a collection. Correspondingly, state(agent1, agent2,...)and action(action1, action2, ...) as collections are used to describe this collection. But the trick is we don't know the number of agents for sure, which gives me a hard time describing n_actions(if 1 agent has 8 actions, 2 would have 64). Does the DQN framework still apply here? If it does, is it possible for you to give me some suggestions about how to modify this framework? Thanks in advance!!!

  • @CustomDabber360
    @CustomDabber360 Před 2 lety

    Amazing! I love your video.

  • @rahuldhanasiri
    @rahuldhanasiri Před 3 lety +1

    Thankyou Dr. Phil for an amazing video. When I try to run this on colab, I get this error : "expected scalar type Float but found Double" at either 18th line or 23rd line of main**.py. I am trying it on cartpole environment and I have also tried to change the observation(line 16) to float 32 but it didn't work.

    • @SenselessTalk
      @SenselessTalk Před 3 lety +1

      btw, to answer this:
      def forward(self, state):
      state = state.to(torch.float32)
      x = F.relu(self.fc1(state))
      x = F.relu(self.fc2(x))
      actions = self.fc3(x)
      return actions

  • @mathmo
    @mathmo Před 4 lety

    Hi Phil, any reason you are using the forward() method on your neural net instead of calling it directly As Q_eval() I.e. using __call__()? I believe in general calling forward() is unsafe, since there’s potentially some necessary magic involving hooks going on under the surface that you might miss.

  • @alisyedj
    @alisyedj Před 2 lety

    Thank you, Prof Phil. Very helpful! Can you expand on what Target networks do? I was reading the paper "Human - level control through deep reinforcement learning" where it talks about target network. Not clear what it is and what are the advantages of creating it. Thank you in advance

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 2 lety

      Target networks help to stabilize training. Using the same network to generate data and evaluate data each time step results in chasing a moving target. The target network changes more slowly so it's a more stable Target

  • @jose-alberto-salazar-jimenez

    I have a question... Say, one trains a model, and save its model state for later use... How would one go about loading the model state and performing testing of the agent?.... I've tried coding something (following what I've found on the internet, being, in a nutshell, loading the model state, changing it to eval model, then with torch no grad, selecting the actions greedily), which during training does pretty well at the end of its training (learning was expected), but when I try testing (for instance, to show others its performance), it performs horribly... can anybody help me?

  • @2ndgenfsdbetatester315

    life-saving video

  • @mT4945
    @mT4945 Před 4 lety

    Hi Phill,
    I just found your channel and I really like your content.
    Do you think reinforcement learning future compared to text mining and image recognition?

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 4 lety

      I think we'll see more applications of RL to those other fields. None of them will get us close to AGI.

  • @miriamramstudio3982
    @miriamramstudio3982 Před 4 lety

    Hi Phil, is it correct that epsilon already reaches the eps_min of 0.01 after only 11 episodes ? Does it mean that we have almost no exploration anymore after 11 episodes ?

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 4 lety +1

      Mostly correct. Only 1% of actions will be exploratory but that's sufficient for learning.

  • @yigitsevim7741
    @yigitsevim7741 Před rokem

    great tutorial, thanks. as a small criticism, please slightly move away from the microphone when coughing.

  • @jjschnyder
    @jjschnyder Před 3 lety

    very nice tutorial. why do you make a memory-array for every element ( state, new state, reward etc..), couldnt you just make one overally memory array and store named Tuples in the form (state, action, reward, newstate, done) ?

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 3 lety

      Yup, that's another way to do it. I use the named arrays because it's easier(for me) to keep track of where everything is stored.

  • @ImDadidu
    @ImDadidu Před 3 lety

    Great video! Helped me a lot with my bachelor thesis. I'm working on a private project now where the agent needs to predict a x_action between -1.0 and 1.0 and a y_action between -1.0 and 1.0. How can I manage the action indices in the learn()-method if I have multiple floats which describe one action? Or do I need a completely differen model for that? Thanks in advance :)

  • @mickpress6718
    @mickpress6718 Před 4 lety

    Hi Phil. Just found this channel, nice :) I may be wrong, but i think there may be a problem in the learn process, mem_counter is never reset, so once its hit batchsize it will learn every time the learn function is called.

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 4 lety

      Nope, functioning as intended.

    • @kontra_21
      @kontra_21 Před 4 lety +1

      That is intended. As he explain in the course, this is because at first there is no information in the state memories due to having just been initialized. So we need the agent to run through X amount of games (where X is your batch size) at a minimum before the agent can start to properly learn. After that it's never supposed to stop learning :)

  • @hossein_haeri
    @hossein_haeri Před 3 lety

    Why did you set the epsion to 1?

  • @billallen9251
    @billallen9251 Před rokem

    I followed and built the tensorflow 2 version of this yesterday and it ran great. I haven't been able to get the pytorch version to ever get above 0. I've scoured the code looking for bugs, I've tried every combination of hyper parameters. Has something changed in pytorch that needs to be reflected in this code. My version is 1.13.1.

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před rokem

      Not that I'm aware of. Shoot me an email with a link to your GitHub. phil@neuralnet.ai

  • @n00bxl71
    @n00bxl71 Před rokem

    I tried implementing this, I implemented it exactly, but it just gets worse and worse. It's hovering at around -500 average score, it seems to just press as many buttons as possible and stay up in the air as soon as epsilon reaches minimum. Any thoughts?

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před rokem

      Are you decaying epsilon and over time?

    • @n00bxl71
      @n00bxl71 Před rokem

      Not entirely sure what you were trying to say, but yes, epsilon is decreasing over time.

    • @n00bxl71
      @n00bxl71 Před rokem

      Could you tell me what version each library is supposed to be at, so that I can better recreate your setup?

  • @mgr1282
    @mgr1282 Před 4 lety

    Hi Mr Phil, I have some issues with your code in the previous video with tf2. I used it for CartPole-v0 and FrozenLake-v0 of gym. for cartpole it did very well but for frozenlake was very very weak. I don't know why.
    BTW, in your code, in the body of build_dqn function, you didn't use input_dims; why?

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 4 lety

      Regarding the input dims, they're inferred with Keras.
      Define poor performance for frozen lake? In my course we get 70% win rate using regular Q learning.

    • @mgr1282
      @mgr1282 Před 4 lety

      @@MachineLearningwithPhil In which of your courses? I've got 70% win rate without neural network. I've expected much more with your tf2 code in the previous video but got under 10%. It was great for cartpole.

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 4 lety

      Why do we use neural networks? What are their use cases and limitations?

    • @mgr1282
      @mgr1282 Před 4 lety

      @@MachineLearningwithPhil I don't know exactly, I'm a beginner in reinforcement learning. I expected it could help our agent to learn better. Deep neural networks needs a lot of data.I know it is one of their limitations.

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 4 lety +1

      Neural nets are designed to work for large / continuous state spaces. They don't handle the small discrete ones very well. Tabular Q learning is far better suited for an environment like the frozen lake.

  • @hackathonhacks4119
    @hackathonhacks4119 Před 3 lety

    ValueError: maximum supported dimension for an ndarray is 32, found 10000 ... from writing all code from here. What might be the issue here ?

  • @mehuljan26
    @mehuljan26 Před 3 lety

    Love your videos. I have a question though, if i want to implement the same code on games with pixel as observation space, how do i do that? I am getting multiple errors while trying to implement breakout-V2

  • @FoxGameing148
    @FoxGameing148 Před 4 lety

    thank for the help

  • @gabrielvalentim197
    @gabrielvalentim197 Před rokem

    Hey Phil, how can I solve local minimum problems in PPO?
    I try to solve Luna Lander with PPO agent (with and without bonus entropy) but my agent stop in local minimum.
    I really appreciate your videos and I using them to improve my skills!!
    Tkss!!

  • @bradduy7329
    @bradduy7329 Před 3 lety

    can you explain that why we don't need call forward function in DeepQNetwork?
    E.g: def forward()
    forward()

    • @marcoss147
      @marcoss147 Před 3 lety

      Pytorch takes care of calling the function. If you call it anything other than forward it won't work. You should check the pytorch docs if you want to learn more

  • @IsaacPFranco
    @IsaacPFranco Před 4 lety

    wondering how you got pytorch to recognize np.bool for self.terminal_memory, brought up an error for me. I had to change dtype to np.uint8

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 4 lety +1

      Older versions of PyTorch used np.uint8. The newer (1.4) version requires np.bool and throws an error with np.uint8

  • @andreamassacci7942
    @andreamassacci7942 Před 4 lety

    Nice video. Well explained.

  • @nandans2506
    @nandans2506 Před 4 lety

    Great content

  • @haneulkim4902
    @haneulkim4902 Před 3 lety

    eps_dec = 5e-4 and each time learning happens it substract current epsilon by eps_dec, so starting from 1 it should output epsilon 1, 0.9995, 0.9990, 0.9985, etc... This is not true when I run main_py for lunar_lander. Why is that so? it shrinks like follows 0.99, 0.95, 0.89, 0.84, etc... seems like it decrease by 0.05.

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 3 lety +1

      The decrement happens each time step; the print is at the end of every episode.

    • @haneulkim4902
      @haneulkim4902 Před 3 lety

      @@MachineLearningwithPhil Oh hahah my bad, thanks Phil!

  • @jasonpeloquin9950
    @jasonpeloquin9950 Před rokem

    This video is very helpful. Did something change with the store_transition function? I am getting an array mismatch saying the requested array would exceed the maximum number of dimension of 1

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před rokem

      If you're using the latest version of gym, the API has changed. Reset returns observation and info and the step function returns observation, reward, done, truncated, info

    • @jasonpeloquin9950
      @jasonpeloquin9950 Před rokem

      ah, can you just take the first argument of observation now with the new api? Also, I just bought your course, this tutorial was very helpful

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před rokem

      Yup, you can discard the debug info.

    • @saifal-wahaibi6448
      @saifal-wahaibi6448 Před rokem

      Hey, how did you resolve the error?

    • @jasonpeloquin9950
      @jasonpeloquin9950 Před rokem

      @@saifal-wahaibi6448 you can just take the first element of that output. I can’t remember if I did it by indexing or doing .item

  • @kutilkol
    @kutilkol Před 2 lety +1

    dude, start using some ide from this millennium omg

  • @SalvatorePellitteri
    @SalvatorePellitteri Před 3 lety +1

    Next time use font size 22 at least.

  • @Penguin134
    @Penguin134 Před 3 lety

    How did you know to use [8] as input dims?

  • @qhieu195
    @qhieu195 Před 4 lety

    Great tutorial!
    Can you make a video that builds a DQN from scratch using Numpy?

  • @spinity8468
    @spinity8468 Před 4 lety

    I thought Q and Q* use a different NN, but it seems not the case here. Am I wrong?

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 4 lety

      I omit the use of the target network in this tutorial. Hence the "simple" part of the title. It's intended to be the simplest implementation that actually works in a non trivial sense.

    • @spinity8468
      @spinity8468 Před 4 lety

      @@MachineLearningwithPhil You did a nice job! I am wondering if you have a similar video using two different networks for Q and Q*. Do you have such thing?

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 4 lety

      czcams.com/video/a5XbO5Qgy5w/video.html

    • @spinity8468
      @spinity8468 Před 4 lety

      @@MachineLearningwithPhil I am not familiar at all with Keras or Tensorflow. Do you have the equivalent with Pytorch?

    • @MachineLearningwithPhil
      @MachineLearningwithPhil  Před 4 lety

      If you check out my github (linked in description), the repo for my course is there. You can see the PyTorch equivalent.

  • @patrickphillips7009
    @patrickphillips7009 Před 4 lety

    At 33:54 "is our children learn..., is our agent learning" funny

  • @alexandrefournier-ahizoune8098

    what does "fc1" stands for ?

  • @abrahamloha3050
    @abrahamloha3050 Před 2 lety

    best

  • @shivg2519
    @shivg2519 Před 4 lety

    nice

  • @emanuelepapucci59
    @emanuelepapucci59 Před 2 lety

    Finally here I see for the first time the fucking plotLearning function ... god... I don't know how many videos I saw without know what that function was and why I couldn't use it ... now finally I know ... you made it ... next time remember to put ALWAYS a refer link under your videos regarding functions that you use and are not inside the packages... otherwise is no sense follow your tutorial... I'm saying this to you for the next time, because I'm a beginner and I can't understand that a function is not inside a package or less, if you don't explain it...