Reinforcement Learning: on-policy vs off-policy algorithms

Sdílet
Vložit
  • čas přidán 28. 08. 2024

Komentáře • 22

  • @MrFalk358
    @MrFalk358 Před 9 měsíci +12

    Ok i will indulge your quiz time questions since your videos are really great!
    Question 1: A is correct. it would not learn at all, since the target policy is the policy which we are trying to learn. Setting it fixed would imply it not changing, which would imply it staying random, therefore we are not learning
    Question 2: Im not completely sure but i would say B is correct, since SARSA uses its target policy both to choose action and to "look" (by taking the action according to the target policy) at its follow up state
    Hope more people comment so the algorithm boosts your channel!

    • @CodeEmporium
      @CodeEmporium  Před 9 měsíci +10

      Ding ding ding! You have been paying attention :) Also thanks a ton for indulging me here. I am trying new ways to make sure this content is engaging and educational at the same time. So the more people like yourself that participate, the more I see the value in this content.

    • @MrFalk358
      @MrFalk358 Před 9 měsíci +1

      @@CodeEmporium i taking a course on rl at the moment which is quite disorganized, your content definitely helps a ton with understanding!

    • @0xabaki
      @0xabaki Před 6 měsíci +1

      @@CodeEmporium I love quiz time! It felt best when professors would quiz us on topics so I can re-engage.

  • @aamirbadershah887
    @aamirbadershah887 Před 9 měsíci +2

    Great video. Would like to point out a mistake at 13:59 where you talk about ON policy but the heading says "Off Policy". I think that needs correction.
    Also would love to see content on multi-agent reinforcement learning and Decision Transformers.

    • @CodeEmporium
      @CodeEmporium  Před 9 měsíci

      If you are talking about the heading in the algorithm, it is correctly labeled off-policy. The screenshot is labeled from a text book in the description.
      And yea. Still scoping out the best concepts to do here in the reinforcement learning playlist! Thanks for the suggestion!

    • @aamirbadershah887
      @aamirbadershah887 Před 9 měsíci +1

      @@CodeEmporium No I meant in the summary slide, bullet No. 6 ( the last bullet point)

  • @CharleyTurner
    @CharleyTurner Před 7 dny

    Great stuff

  • @marcdelabarreraibardalet4754

    Nice video, well explained. Question, why would I use one or the other? Are there advantages or disadvantages?

  • @aitorgonzalezgonzalez9395
    @aitorgonzalezgonzalez9395 Před 3 měsíci

    I think i found an error in the summary, you wrote twice "Off Policy RL Algorithms". Apart from that, thanks so much for the video, it helped me a lot.

  • @zhezhe3351
    @zhezhe3351 Před 4 měsíci

    Good video!there is a small typo at the summary page about on-policy

  • @Enerdzizer
    @Enerdzizer Před měsícem

    Do we really update Q value function at the exploration step in Sarsa method? Seems that we have to skip this update since we make random step while exploring

  • @Trubripes
    @Trubripes Před 13 dny

    Where is the normalization term for state probability for offpolicy algorithms ?

  • @muralidhar40
    @muralidhar40 Před měsícem

    QT-1: "Target policies" are supposed to learn from experimental actions undertaken by "Behavior policies" to set their Q values right. If the "Target policy" were set to be "random" instead of "greedy learning", then there is no learning at all. Hence the answer should be first option - The agent does not learn at all.

  • @mumbo2526
    @mumbo2526 Před 8 měsíci

    Amazing Video, thank you!

  • @broccoli322
    @broccoli322 Před 9 měsíci +1

    Thanks for the video! ☺

  • @kiranbade9481
    @kiranbade9481 Před 4 měsíci

    well explained brother

  • @alonsovalderramahickmann940
    @alonsovalderramahickmann940 Před 8 měsíci

    Very nice video man

  • @hugeturnip3520
    @hugeturnip3520 Před 5 měsíci

    Thank you so much dude

  • @moaaathkhalil
    @moaaathkhalil Před 8 měsíci

    Well explained!

  • @user-xv9qk3iz7b
    @user-xv9qk3iz7b Před 6 měsíci