#49 - Meta-Gradients in RL - Dr. Tom Zahavy (DeepMind)

Sdílet
Vložit
  • čas přidán 5. 08. 2024
  • The race is on, we are on a collective mission to understand and create artificial general intelligence. Dr. Tom Zahavy, a Research Scientist at DeepMind thinks that reinforcement learning is the most general learning framework that we have today, and in his opinion it could lead to artificial general intelligence. He thinks there are no tasks which could not be solved by simply maximising a reward.
    Back in 2012 when Tom was an undergraduate, before the deep learning revolution he attended an online lecture on how CNNs automaticaly discover representations. This was an epiphany for Tom. He decided in that very moment that he was going to become an ML researcher. Tom's view is that the ability to recognise patterns and discover structure is the most important aspect of intelligence. This has been his quest ever since. He is particularly focused on using diversity preservation and metagradients to discover this structure.
    In this discussion we dive deep into meta gradients in reinforcement learning.
    Tim Introduction [00:00:00]
    Main show kick off [00:07:15]
    On meta gradients [00:09:27]
    Taxonomy of meta gradient methods developed in recent years [00:11:43]
    Why don't you just do one big learning run? [00:13:58]
    Transfer learning / life long learning [00:16:01]
    Does the meta algorithm also have hyperparameters? [00:17:55]
    Are monolithic learning architectures bad then? [00:19:45]
    Why not have the learning agent (self-) modify its own parameters? [00:24:44]
    Learning optimizers using evolutionary approaches [00:26:29]
    Which parameters should we leave alone in meta optimization? [00:28:24]
    Evolutionary methods are great in this space! Diversity preservation [00:30:42]
    Approaches to divergence, intrinsic control [00:33:25]
    How to decide on parameters to optimise and build a meta learning framework [00:35:55]
    Proxy models to move from discrete domain to differentiable domain [00:39:32]
    Multi lifetime training -- picking environments [00:43:35]
    2016 Minecraft paper [00:46:07]
    Lifelong learning [00:49:54]
    Corporations are real world AIs. Could we recognise non-human AGIs? [00:52:09]
    Tim invokes Francois Chollet, of course! [00:55:09]
    But David Silver says that reward is all you need? [00:56:57]
    Program centric generalization [00:59:59]
    Sara Hooker -- The hardware lottery, JAX, Bitter Lesson [01:02:10]
    Concerning trends in the community right now? [01:05:15]
    Unexplored areas in ML research? [01:06:47]
    Should Ph.D Students be going into Meta Gradient work? [01:08:18]
    Is RL too hard for the average person to embark on? [01:10:45]
    People back in the 80s had a pretty good idea already, concept papers were cool [01:15:16]
    Non-stationary data, do you have to re-train the model all the time [01:17:36]
    Graying the Blackbox paper and visualizing the structure of DQNs with tSNE [01:19:16]
    Transcript: docs.google.com/document/d/14...
    Meta-Policy Gradients: A Survey [Robert Lange]
    roberttlange.github.io/posts/...
    A Self-Tuning Actor-Critic Algorithm [Tom Zahavy et al]
    arxiv.org/abs/2002.12928
    Graying the black box: Understanding DQNs [Zahavy et al]
    utstat.toronto.edu/droy/icml1...
    Is a picture worth a thousand words? [Tom Zahavy et al]
    arxiv.org/abs/1611.09534
    Diversity is All You Need: Learning Skills without a Reward Function [Benjamin Eysenbach et al]
    arxiv.org/abs/1802.06070
    Evolutionary principles in self-referential learning [Jürgen Schmidhuber]
    people.idsia.ch//~juergen/dip...
    Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning [Tom Zahavy et al]
    arxiv.org/abs/1809.02121
    Training Learned Optimizers with Randomly Initialized Learned Optimizers [Luke Metz et al]
    arxiv.org/pdf/2101.07367.pdf
    A Deep Hierarchical Approach to Lifelong Learning in Minecraft [Chen Tessler, ..Tom Zahavy et al]
    arxiv.org/abs/1604.07255
    MUTUAL INFORMATION STATE INTRINSIC CONTROL [Rui Zhao et al]
    openreview.net/pdf?id=OthEq8I5v1
    Mutual Information-based State-Control for Intrinsically Motivated Reinforcement Learning [Rui Zhao et al]
    arxiv.org/abs/2002.01963
    Rainbow: Combining Improvements in Deep Reinforcement Learning [Matteo Hessel et al]
    arxiv.org/abs/1710.02298
    Variational Intrinsic Control
    arxiv.org/abs/1611.07507
    Meta-Gradient Reinforcement Learning [Zhongwen Xu et al]
    arxiv.org/abs/1805.09801
    On Learning Intrinsic Rewards for Policy Gradient Methods [Zeyu Zheng, Junhyuk Oh, Satinder Singh]
    arxiv.org/abs/1804.06459
    Visuals and music: melodysheep
    Please support them on patreon and buy their soundtrack as we did @ melodysheep.bandcamp.com/albu...
    LIFE BEYOND: Chapter 1: • LIFE BEYOND: Chapter ...
    Keep in mind that MLST is 100% non-monetized, non-commercial and educational

Komentáře • 30

  • @DavenH
    @DavenH Před 3 lety +14

    Fabulous episode. I loved this part "Karpathy: Deep RL doesn't work, yet [[for me]] / Zahavy: ...It works" :)
    Yeah, I can't help but wonder how these voices would ever be heard without your show. This must be true for 99% of scientists doing great work but with no exposure.

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  Před 3 lety +4

      Thanks Daven, always a pleasure to hear from you my friend. We think that scientists and engineers are the heros of our time.

  • @AICoffeeBreak
    @AICoffeeBreak Před 3 lety +6

    Wow, really love this episode. I think the guest has great insight and answers the world needs to hear. And the hosts are great at asking the right questions! 👏

  • @PeterOtt
    @PeterOtt Před 3 lety +17

    I'm really liking the breadth of topics you guys are covering. Also your video production and quality of discussions never disappoints. Looking forward to the next hour and a half!

  • @DelandaBaudLacanian
    @DelandaBaudLacanian Před 2 lety

    This is amazing work that I keep coming back to over-and-over again. It's inspired me to get out of boring accounting projects and get back into Kaggle projects so I can pretend to start understanding all these pioneering papers and concepts this channel consistently elevates and delivers on. Thank you Dr Scarfe!

  • @machinelearningdojowithtim2898

    Excited about this one!!! Here we go!!! ✌😜😜🙌🙌💥💥

  • @KristoferPettersson
    @KristoferPettersson Před 3 lety +4

    Sweet! Dr Zahavy is a north star! Great episode!

  • @abby5493
    @abby5493 Před 3 lety +2

    Love the intro Dr Scarfe 😍

  • @ltrinhmuseum
    @ltrinhmuseum Před 2 lety

    These types of channels are very powerful.

  • @NelsLindahl
    @NelsLindahl Před 3 lety +1

    Wonderful content this week. The modular vs. monolithic discussion was on point.

  • @JousefM
    @JousefM Před 3 lety +1

    That is a damn good episode, thanks guys!

  • @23kl104
    @23kl104 Před 3 lety +1

    Awesome episode. I'm really enjoying the discussion format and appreciate all the work you guys are putting into this. It's perfect for getting a rough overview on some ML topic and knowing about open questions and where the research is headed. I hope you are also gaining a lot from doing this and continue with this podcast!

    • @machinelearningdojowithtim2898
      @machinelearningdojowithtim2898 Před 3 lety +1

      Your question about stationarity seems to have gone but it just means changing statistical properties over time. Most ML models assume a stationary distribution. On financial datasets one of the preprocessing tricks is to remove non-stationarity i.e. a trend or seasonality. As you can imagine in the RL world we are talking about changing state-action spaces which is a whole new world of pain 😃 Imaging a simple example that the dog is moving around the world over time

    • @23kl104
      @23kl104 Před 3 lety +1

      @@machinelearningdojowithtim2898 Haha thanks for the answer nonetheless. It became more clear to me throughout the video. Although, I understood it more in the sense that when applying the meta-model to a new RL learning task will result in a distribution shift, instead of a distribution shift in one particular task. I guess I shouldn't have deleted my comment. It would be nice to clarify this.

  • @tomerga2
    @tomerga2 Před 2 lety

    Love the visuals

  • @guillaumewenzek4210
    @guillaumewenzek4210 Před 3 lety +2

    Is there a recommended library to play with Meta-Learning outside of RL ? Preferably something that works with pytorch

  • @karimmarbouh2553
    @karimmarbouh2553 Před 3 lety +1

    Dope

  • @Mutual_Information
    @Mutual_Information Před 3 lety +3

    Oh my god.. Yannic codes on 2 computers!

  • @subjectpoolcoordinator3587

    Where to find simplest code implementation/coding tutorials on this.... math symbols makes zero sense to me but I get clear picture when I code. If anyone has any information on a very simple code implementation or help me please reply. My immense gratitude to thee for help in advance~ Thanks!!

  • @DavidBick321
    @DavidBick321 Před 3 lety +2

    I love you Yannic

  • @Bellenchia
    @Bellenchia Před 3 lety +1

    This got me thinking, perhaps it is utterly ridiculous to use a single simple algorithm for all of training.
    That puts the onus almost entirely on the model, which is likely one of the reasons why we create multi-billion parameter neural networks that are out performed by children.
    Think about the diversity of things that motivate us, intimidate us, or otherwise influence our behavior. They're mostly emergent from our social structure, right? Or are we just trying to conserve entropy all the way through?
    I'm not convinced either way, entropy might be the fundamental thing underlying intelligence for all we know.
    I've seen something kin to this discussed in the context of AI-alignment and meso optimizers, which may be an interesting follow up topic.

  • @Chr0nalis
    @Chr0nalis Před 3 lety +4

    Quite distracting when the camera switches to random peoples faces when someone is talking, imo.

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  Před 3 lety +2

      Thanks for the feedback, we are noobs at video editing and learning as fast as we can

    • @mh49897
      @mh49897 Před 3 lety +6

      I disagree and really appreciate the production quality.

    • @AICoffeeBreak
      @AICoffeeBreak Před 3 lety

      @@MachineLearningStreetTalk But still, I think the quality is great. You have less noob-score in editing compared to other creators out there. Right, Ms. Coffee Bean? 😅

  • @shankar2chari
    @shankar2chari Před 3 lety +2

    Guys, I am stressed... Who is a Data Scientist, Please make an episode to clarify?
    The one who..
    1. Knows Keras/PyTorch APIs and ensemble with LightGBM for convergence
    2. Got a good Kaggle Ranking and a CZcams channel with 1000+ followers
    3. Aware but never understood the probabilistic nature of everything, also quite unsettled.
    3a. He knows the answer for questions like "What is a bijector", "When Covariance matrix should be positive semi-definite"
    4. Wrote SQL queries until 2020, it is 2021: Naturally he is a Data Scientist
    5. No, there is no such person called Data Scientist in the known universe

  • @ericadar
    @ericadar Před 2 lety

    I think "Tom" is not short for "Tomas" for Zahavy. you might want to change the title

    • @MachineLearningStreetTalk
      @MachineLearningStreetTalk  Před 2 lety

      Thank you so much for pointing this out. I feel sure that I must have seen it written down as "Tomas" somewhere, but I can't find it. I have changed it to "Tom".