Deep Q-Networks Explained!

Sdílet
Vložit
  • čas přidán 27. 08. 2024
  • Let's talk about deep q-learning, a popular reinforcement learning algorithm
    ABOUT ME
    ⭕ Subscribe: www.youtube.co...
    📚 Medium Blog: / dataemporium
    💻 Github: github.com/ajh...
    👔 LinkedIn: / ajay-halthor-477974bb
    PLAYLISTS FROM MY CHANNEL
    ⭕ Reinforcement Learning: • Reinforcement Learning...
    Natural Language Processing: • Natural Language Proce...
    ⭕ Transformers from Scratch: • Natural Language Proce...
    ⭕ ChatGPT Playlist: • ChatGPT
    ⭕ Convolutional Neural Networks: • Convolution Neural Net...
    ⭕ The Math You Should Know : • The Math You Should Know
    ⭕ Probability Theory for Machine Learning: • Probability Theory for...
    ⭕ Coding Machine Learning: • Code Machine Learning
    MATH COURSES (7 day free trial)
    📕 Mathematics for Machine Learning: imp.i384100.ne...
    📕 Calculus: imp.i384100.ne...
    📕 Statistics for Data Science: imp.i384100.ne...
    📕 Bayesian Statistics: imp.i384100.ne...
    📕 Linear Algebra: imp.i384100.ne...
    📕 Probability: imp.i384100.ne...
    OTHER RELATED COURSES (7 day free trial)
    📕 ⭐ Deep Learning Specialization: imp.i384100.ne...
    📕 Python for Everybody: imp.i384100.ne...
    📕 MLOps Course: imp.i384100.ne...
    📕 Natural Language Processing (NLP): imp.i384100.ne...
    📕 Machine Learning in Production: imp.i384100.ne...
    📕 Data Science Specialization: imp.i384100.ne...
    📕 Tensorflow: imp.i384100.ne...

Komentáře • 50

  • @CodeEmporium
    @CodeEmporium  Před 9 měsíci +11

    If you like this video and you think I deserve it, please consider giving this video a like. Subscribe for more!

    • @0xabaki
      @0xabaki Před 6 měsíci

      I second this statement

  • @deviduttanayak2684
    @deviduttanayak2684 Před 8 měsíci +14

    quiz 1=A
    q2=B
    q3=C

  • @neetpride5919
    @neetpride5919 Před 9 měsíci +22

    Where does the target network come from, and if it's the ideal "conscience" why not just use that? If we already have the ideal network, why bother training a second one?

    • @CodeEmporium
      @CodeEmporium  Před 8 měsíci +17

      Good question. Maybe my rhetoric was not super clear here. Essentially, without that target network, the Q network would compute the loss by comparing to itself. In practice, this can lead to unstable values as it is chasing a moving target. Hence a slightly delayed network is introduced to stabilize training .
      Note this target network- isn’t the final iteration of ideal conscience. It is rather an iteration in the direction of ideal conscience. I say “ideal conscience” in this context to illustrate that the loss is computed based on this target network value. But this target network also gets better over time

    • @zerge69
      @zerge69 Před 4 měsíci +12

      The target network should be called the "snapshot network". It's simply an older version of the Q-network, over which you improve.

  • @ajaytaneja111
    @ajaytaneja111 Před 6 měsíci +6

    Sorry Ajay, I'm not sure I'm getting it. What do you mean an idealised network (you say Frank's idealised conscience)? Where does it come from? Looks like you say that's the actual solution (idealised conscience) but what's it's origin?

  • @katnip1917
    @katnip1917 Před 2 měsíci

    Great Video!! Thank you for the explanation. My question is, why not use the current state in the target network, instead of the next state?

  • @sotasearcher
    @sotasearcher Před 9 měsíci +2

    A scenario a computer could benefit from learning on it's own: I remember Google reporting research on a model that used RL and was able to find more efficient assembly code for a sorting algorithm

  • @seno3863
    @seno3863 Před měsícem

    It'll be really ideal if we can have the quize's answers presented in the video instead of answering by comments since it might be inaccurate and there will be a huge time loss during the waiting for reply.
    The quiz was a cool idea for understanding though, really helps.

  • @royvivat113
    @royvivat113 Před 5 měsíci

    Great video, this was helpful for me. The only thing that I found pretty confusing was the target network explanation, which I saw you address in another comment. You described it as the ideal conscience which really made it seem like its the optimal q-network that we're comparing to (which would defeat the purpose of training if we had that). In fact since gets updated every few batches, its less ideal that the q-network.

  • @rpraver1
    @rpraver1 Před 9 měsíci +2

    This was a good video, but I would love to see a deeper dive into your transformer series, that was the best, but I am still missing clarity on some of the steps. Your explanations are the best and would love to see more.
    I have re-watched your videos atleast 10 times and have many questions, we need more of your explanations. Keep it up.

    • @CodeEmporium
      @CodeEmporium  Před 9 měsíci +1

      Thanks! Yea I am trying to get core concept videos out first and will soon love to dive into a series where I implement this system too :)

    • @hamzaali98
      @hamzaali98 Před 8 měsíci

      @@CodeEmporium Hey! A decision transformer video would be really appreciated

  • @amithapa1994
    @amithapa1994 Před 8 měsíci +3

    Quiz 2:
    B. It stores Q for future reference

  • @edro1128
    @edro1128 Před 9 dny

    where the target network comes from? Thnaks

  • @sharonkevin9906
    @sharonkevin9906 Před 6 měsíci

    Love your videos mehn. They’ve really helped me understand the concepts

  • @Trubripes
    @Trubripes Před 12 dny

    don't think a DQN outputs actions, that would make it a policy gradients.
    It uses MC to collect Q values and use it for supervised training right ?

  • @eliasblancocastro9677
    @eliasblancocastro9677 Před 3 měsíci

    Amazing video and explanation! I have a question, Can I use SGD instead of MSE?

    • @CodeEmporium
      @CodeEmporium  Před 3 měsíci

      Thanks! SGD is an optimizer (algorithm that describes HOW a model learns) while MSE is a loss function (a function that describes WHAT to minimize). They serve different purposes. But in general, you can replace loss functions with appropriate counter parts. They may not work exactly as described, but they can work in general

  • @zerge69
    @zerge69 Před 4 měsíci

    Awesome explanation, thanks. Except the quizzes.

  • @bean217
    @bean217 Před 6 měsíci

    Is the target network also randomly initialized? Is it initialized with the same parameters as the Q-network?
    From what I gather, the Q-network is acting as our behavior policy, and the target network is acting as our target policy. The way you describe it here makes it seem like the target network is already learned, but that would defeat the purpose of the algorithm in the first place.

  • @user-sx1rt2lm6b
    @user-sx1rt2lm6b Před 8 měsíci +1

    I did not understand where the target network comes from? And if it exists, why should a new one be trained?

    • @CodeEmporium
      @CodeEmporium  Před 8 měsíci +1

      Good question. From my understanding the answer is more practical than theoretical.
      The target network ensures the Q network isn’t chasing a moving target. If the network was compared against itself for every iteration, training would not be stable. Hence another slightly delayed network is introduced to ensure this stability

  • @ChadHowarth
    @ChadHowarth Před měsícem +2

    If you'd like to see your channel perform better, you might consider that your audience is composed of intelligent adults.

  • @muralidhar40
    @muralidhar40 Před měsícem

    QT-1: Option A (by definition)

  • @florentb8578
    @florentb8578 Před 4 měsíci

    Brillant explanation, well done

  • @jenniferdsouza7708
    @jenniferdsouza7708 Před měsícem

    why is it called a Q network and not just a neural network?

  • @gayatri8728
    @gayatri8728 Před 3 měsíci

    Amazing explanation 🎉🎉🎉🎉

  • @johantchassem1553
    @johantchassem1553 Před 6 měsíci

    Thanks for the explanation.

  • @axelolafsson7312
    @axelolafsson7312 Před 4 měsíci

    this video is great

  • @hakunamatata1o1
    @hakunamatata1o1 Před 4 měsíci

    GOOD EXPLAINATION

  • @sotasearcher
    @sotasearcher Před 9 měsíci +3

    2:40 A

  • @Christoo228
    @Christoo228 Před měsícem

    εισαι κουκλος .

  • @harshsonar9346
    @harshsonar9346 Před 3 měsíci

    ✨Quiiizz Timmmeeee✨

  • @himanshumeena745
    @himanshumeena745 Před 5 měsíci

    quiz time 3 ka answer hai C , sahI khe rha hu na codemporium bhai

  • @rishukumar4045
    @rishukumar4045 Před měsícem

    q1=A

  • @riadhossainbhuiyan4978
    @riadhossainbhuiyan4978 Před 7 měsíci +1

    B

  • @NaveenKumar-vn7vx
    @NaveenKumar-vn7vx Před 9 měsíci +2

    A

  • @labreynth
    @labreynth Před 13 dny

    On my life, I've got no idea what you're on about, at any stage of the video.
    It feels like you're jumping between concepts and not explaining how they're linked.
    I'd rather you say what a QN is, describe how it works, then give the Frank example

  • @user-vp6fh8gx7z
    @user-vp6fh8gx7z Před 9 měsíci

    QAnon Network.

  • @tomoki-v6o
    @tomoki-v6o Před 9 měsíci

    teach him how to use a pencil

  • @riadhossainbhuiyan4978
    @riadhossainbhuiyan4978 Před 7 měsíci

    Q3.A

  • @jongxina3595
    @jongxina3595 Před 7 měsíci

    Very cringey but good video nonetheless 👍