Automated Machine Learning - Successive Halving and Hyperband

Sdílet
Vložit
  • čas přidán 6. 09. 2022
  • In this video, we take a look at Successive Halving, which is an extension of random search to make it more efficient, as well as Hyperband, which is an extension of Successive Halving. Both methods can be used for finding good hyperparameters (and algorithms) for a machine learning problem.
    Original Successive Halving paper: arxiv.org/pdf/1502.07943.pdf
    Original Hyperband paper: arxiv.org/pdf/1603.06560.pdf
    If you liked the video, make sure to share it with others!
    Any comments, feedback, or questions? Let me know in the comments section!
  • Věda a technologie

Komentáře • 13

  • @Hoe-ssain
    @Hoe-ssain Před rokem +2

    In 16.44 why would we violate the maximum R(81)? Wouldn't we be taking n= 3 and r =27. That doesn't violate the max R. In fact as per your table, taking 6*27 = 162 .81 violates this rule. I am lost. Can you please explain?

    • @aixplained4763
      @aixplained4763  Před rokem

      Good question! This observation is based on the original example given by the authors. However, this original example given by the authors is unfortunately wrong. To make sure that you understand the method fully, you could try to follow their pseudocode (link to their paper in description). You will end up with different numbers in the table.

  • @deepsutariya929
    @deepsutariya929 Před 3 měsíci +1

    Hyperband was like headache before watching your video. Now it is clear. Thank you for such a beautiful content and examples.
    you shouldn't stop making videos though it's very unfortunate that you have only few subscribers.

  • @gowtime
    @gowtime Před rokem +1

    Great video, I finally understood Hyperband thanks to you and was able to use it in Keras confidently. Thanks! Do you know other hyperparameter tuning approaches that may be better/worth exploring?

    • @aixplained4763
      @aixplained4763  Před rokem +2

      Glad to hear that it was helpful! :) Hyperband relies on a model-free approach (successive halving) that does not aim to learn a predictive model that maps any configuration to a predicted performance. The approaches that do this (called Bayesian optimization), like Tree Parzen Estimator, can be more efficient and require less trial-and-error. It would also even be possible to combine this with Hyperband or successive halving, making it even more efficient. If you are interested, there is also a video about the Tree Parzen Estimator.

  • @haneulkim4902
    @haneulkim4902 Před rokem +1

    Amazing video! One question, so for each bracket in hyperband new set of configuration is chosen from total set of hyperparameters, correct? So there may be duplicate configuration, so same configuration may be in bracket 1 and 2?

    • @aixplained4763
      @aixplained4763  Před rokem +2

      Thank you! Yes, that is absolutely correct :)

    • @haneulkim4902
      @haneulkim4902 Před rokem +1

      ​@@aixplained4763 I'm still unsure about hyperband's benefit, so for each consequtive bracket it resample smaller set then previous bracket from hyperparameter configurations. Since it is randomly sampling hyperparameters of final bracket aren't the best ones and they are trained for a long time. What exactly is the benefit over simple successive halving...

    • @aixplained4763
      @aixplained4763  Před rokem +1

      @@haneulkim4902 Good question! In regular successive halving, we have the issue that the halving can be too aggressive (prematurely discarding the better configurations because they needed some more time to yield good performance). Finding the right level of "aggression" is not easy to do. Hyperband basically does multiple successive halving brackets with different levels of "aggression" to solve this. In the end, it is indeed sometimes the case that more training time leads to better performance, but not always. Moreover, after performing all brackets in hyperband, you could do post-process the result. E.g., you could select the best candidate from every bracket and train them all with the same budget.

  • @engcaiobarros
    @engcaiobarros Před rokem

    Thank you so much for this inspiring lesson. We have 5 brackets because we should consider log_n (R) + 1 brackets?

    • @aixplained4763
      @aixplained4763  Před rokem

      Good to hear! :) Good question! Indeed, that's correct.

  • @Thamizhadi
    @Thamizhadi Před rokem

    Silly Question: Which software do you use for making your slides? The math symbols look so nice.

    • @aixplained4763
      @aixplained4763  Před rokem +3

      Happy to hear that you like the symbols! The slides are created in Google Slides and I copy/paste symbols from a latex2image generator such as latex2image.joeraut.com/