Deep Q-Networks Explained!

CodeEmporium

zhlédnutí 22 850

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 27. 08. 2024
Let's talk about deep q-learning, a popular reinforcement learning algorithm
ABOUT ME
⭕ Subscribe: www.youtube.co...
📚 Medium Blog: / dataemporium
💻 Github: github.com/ajh...
👔 LinkedIn: / ajay-halthor-477974bb
PLAYLISTS FROM MY CHANNEL
⭕ Reinforcement Learning: • Reinforcement Learning...
Natural Language Processing: • Natural Language Proce...
⭕ Transformers from Scratch: • Natural Language Proce...
⭕ ChatGPT Playlist: • ChatGPT
⭕ Convolutional Neural Networks: • Convolution Neural Net...
⭕ The Math You Should Know : • The Math You Should Know
⭕ Probability Theory for Machine Learning: • Probability Theory for...
⭕ Coding Machine Learning: • Code Machine Learning
MATH COURSES (7 day free trial)
📕 Mathematics for Machine Learning: imp.i384100.ne...
📕 Calculus: imp.i384100.ne...
📕 Statistics for Data Science: imp.i384100.ne...
📕 Bayesian Statistics: imp.i384100.ne...
📕 Linear Algebra: imp.i384100.ne...
📕 Probability: imp.i384100.ne...
OTHER RELATED COURSES (7 day free trial)
📕 ⭐ Deep Learning Specialization: imp.i384100.ne...
📕 Python for Everybody: imp.i384100.ne...
📕 MLOps Course: imp.i384100.ne...
📕 Natural Language Processing (NLP): imp.i384100.ne...
📕 Machine Learning in Production: imp.i384100.ne...
📕 Data Science Specialization: imp.i384100.ne...
📕 Tensorflow: imp.i384100.ne...

Komentáře • 50

@CodeEmporium Před 9 měsíci ⁺¹¹
If you like this video and you think I deserve it, please consider giving this video a like. Subscribe for more!
@0xabaki Před 6 měsíci
I second this statement
@deviduttanayak2684 Před 8 měsíci ⁺¹⁴
quiz 1=A
q2=B
q3=C
@neetpride5919 Před 9 měsíci ⁺²²
Where does the target network come from, and if it's the ideal "conscience" why not just use that? If we already have the ideal network, why bother training a second one?
@CodeEmporium Před 8 měsíci ⁺¹⁷
Good question. Maybe my rhetoric was not super clear here. Essentially, without that target network, the Q network would compute the loss by comparing to itself. In practice, this can lead to unstable values as it is chasing a moving target. Hence a slightly delayed network is introduced to stabilize training .
Note this target network- isn’t the final iteration of ideal conscience. It is rather an iteration in the direction of ideal conscience. I say “ideal conscience” in this context to illustrate that the loss is computed based on this target network value. But this target network also gets better over time
@zerge69 Před 4 měsíci ⁺¹²
The target network should be called the "snapshot network". It's simply an older version of the Q-network, over which you improve.
@ajaytaneja111 Před 6 měsíci ⁺⁶
Sorry Ajay, I'm not sure I'm getting it. What do you mean an idealised network (you say Frank's idealised conscience)? Where does it come from? Looks like you say that's the actual solution (idealised conscience) but what's it's origin?
@katnip1917 Před 2 měsíci
Great Video!! Thank you for the explanation. My question is, why not use the current state in the target network, instead of the next state?
@sotasearcher Před 9 měsíci ⁺²
A scenario a computer could benefit from learning on it's own: I remember Google reporting research on a model that used RL and was able to find more efficient assembly code for a sorting algorithm
@sotasearcher Před 9 měsíci ⁺²
It was AlphaDev
@seno3863 Před měsícem
It'll be really ideal if we can have the quize's answers presented in the video instead of answering by comments since it might be inaccurate and there will be a huge time loss during the waiting for reply.
The quiz was a cool idea for understanding though, really helps.
@royvivat113 Před 5 měsíci
Great video, this was helpful for me. The only thing that I found pretty confusing was the target network explanation, which I saw you address in another comment. You described it as the ideal conscience which really made it seem like its the optimal q-network that we're comparing to (which would defeat the purpose of training if we had that). In fact since gets updated every few batches, its less ideal that the q-network.
@rpraver1 Před 9 měsíci ⁺²
This was a good video, but I would love to see a deeper dive into your transformer series, that was the best, but I am still missing clarity on some of the steps. Your explanations are the best and would love to see more.
I have re-watched your videos atleast 10 times and have many questions, we need more of your explanations. Keep it up.
@CodeEmporium Před 9 měsíci ⁺¹
Thanks! Yea I am trying to get core concept videos out first and will soon love to dive into a series where I implement this system too :)
@hamzaali98 Před 8 měsíci
@@CodeEmporium Hey! A decision transformer video would be really appreciated
@amithapa1994 Před 8 měsíci ⁺³
Quiz 2:
B. It stores Q for future reference
@edro1128 Před 9 dny
where the target network comes from? Thnaks
@sharonkevin9906 Před 6 měsíci
Love your videos mehn. They’ve really helped me understand the concepts
@Trubripes Před 12 dny
don't think a DQN outputs actions, that would make it a policy gradients.
It uses MC to collect Q values and use it for supervised training right ?
@eliasblancocastro9677 Před 3 měsíci
Amazing video and explanation! I have a question, Can I use SGD instead of MSE?
@CodeEmporium Před 3 měsíci
Thanks! SGD is an optimizer (algorithm that describes HOW a model learns) while MSE is a loss function (a function that describes WHAT to minimize). They serve different purposes. But in general, you can replace loss functions with appropriate counter parts. They may not work exactly as described, but they can work in general
@zerge69 Před 4 měsíci
Awesome explanation, thanks. Except the quizzes.
@bean217 Před 6 měsíci
Is the target network also randomly initialized? Is it initialized with the same parameters as the Q-network?
From what I gather, the Q-network is acting as our behavior policy, and the target network is acting as our target policy. The way you describe it here makes it seem like the target network is already learned, but that would defeat the purpose of the algorithm in the first place.
@user-sx1rt2lm6b Před 8 měsíci ⁺¹
I did not understand where the target network comes from? And if it exists, why should a new one be trained?
@CodeEmporium Před 8 měsíci ⁺¹
Good question. From my understanding the answer is more practical than theoretical.
The target network ensures the Q network isn’t chasing a moving target. If the network was compared against itself for every iteration, training would not be stable. Hence another slightly delayed network is introduced to ensure this stability
@ChadHowarth Před měsícem ⁺²
If you'd like to see your channel perform better, you might consider that your audience is composed of intelligent adults.
@muralidhar40 Před měsícem
QT-1: Option A (by definition)
@florentb8578 Před 4 měsíci
Brillant explanation, well done
@jenniferdsouza7708 Před měsícem
why is it called a Q network and not just a neural network?
@gayatri8728 Před 3 měsíci
Amazing explanation 🎉🎉🎉🎉
@johantchassem1553 Před 6 měsíci
Thanks for the explanation.
@axelolafsson7312 Před 4 měsíci
this video is great
@hakunamatata1o1 Před 4 měsíci
GOOD EXPLAINATION
@sotasearcher Před 9 měsíci ⁺³
2:40 A
@CodeEmporium Před 9 měsíci ⁺²
That’s right! Nice!
@Christoo228 Před měsícem
εισαι κουκλος .
@harshsonar9346 Před 3 měsíci
✨Quiiizz Timmmeeee✨
@himanshumeena745 Před 5 měsíci
quiz time 3 ka answer hai C , sahI khe rha hu na codemporium bhai
@rishukumar4045 Před měsícem
q1=A
@riadhossainbhuiyan4978 Před 7 měsíci ⁺¹
B
@NaveenKumar-vn7vx Před 9 měsíci ⁺²
A
@CodeEmporium Před 9 měsíci ⁺¹
A! Yep that’s right for Quiz 1
@labreynth Před 13 dny
On my life, I've got no idea what you're on about, at any stage of the video.
It feels like you're jumping between concepts and not explaining how they're linked.
I'd rather you say what a QN is, describe how it works, then give the Frank example
@user-vp6fh8gx7z Před 9 měsíci
QAnon Network.
@tomoki-v6o Před 9 měsíci
teach him how to use a pencil
@CodeEmporium Před 9 měsíci ⁺⁶
I am a pencil
@riadhossainbhuiyan4978 Před 7 měsíci
Q3.A
@jongxina3595 Před 7 měsíci
Very cringey but good video nonetheless 👍
@hakunamatata1o1 Před 4 měsíci ⁺³
SHUDDAP
HE'S GIVING A GOOD VIBE

Další v pořadí

Automatické přehrávání

Proximal Policy Optimization | ChatGPT uses this