Machine Learning Lecture 27 "Gaussian Processes II / KD-Trees / Ball-Trees" -Cornell CS4780 SP17

Sdílet
Vložit
  • čas přidán 10. 09. 2024

Komentáře • 48

  • @bharasiva96
    @bharasiva96 Před 4 lety +27

    KD-Trees begins at 28:50

  • @yuanchia-hung8613
    @yuanchia-hung8613 Před 4 lety +16

    The best explanation for Gaussian Process ever!

  • @mlst3rg
    @mlst3rg Před 4 lety +34

    this series is a work of art. needs way more views.

  • @rajupowers
    @rajupowers Před 4 lety +10

    Most intuitive explanation of the topics in classroom

  • @clementpeng
    @clementpeng Před 4 lety +8

    Love this. Probably the clearest explanation i have seen on GP online.

  • @deltasun
    @deltasun Před 4 lety +5

    thank you very much! I've tried a couple of times to understand GPs, but always gave up. Now i think they're much clearer to me. very very greatful

  • @raedbouslama2263
    @raedbouslama2263 Před 3 lety +5

    The previous video and the current one are the best material I watched on Gaussian Processes! Wonderful :)

    • @peterhojnos6705
      @peterhojnos6705 Před 3 lety

      definitely! I saw many, but this one is one of the best

  • @abhinav9561
    @abhinav9561 Před 3 lety

    Prof Killian killin it! Thanks prof for all the lectures. This course should be the first introduction to the Machine Learning world for everyone

  • @chamaleewickrama3276
    @chamaleewickrama3276 Před 3 lety +1

    Omg. I love this lecture material. To the point, clear and the best!

  • @saikumartadi8494
    @saikumartadi8494 Před 4 lety +4

    awesome simulation of a beautiful application !

  • @isaacbuitrago2370
    @isaacbuitrago2370 Před 4 lety +2

    You make it look easy ! Thanks for the clear explanation of GP.

  • @Illinoise888
    @Illinoise888 Před 4 lety +3

    This helps me with my exam preparation, thank you.

  • @atagomes_lncc_br
    @atagomes_lncc_br Před 3 lety

    Best and simplest explanation of GPR.

  • @AlexPadula
    @AlexPadula Před 5 lety +3

    Thank you very much, these lectures are really useful.

  • @vaaal88
    @vaaal88 Před 4 lety +3

    this is such a great lesson. Thanks!

  • @udiibgui2136
    @udiibgui2136 Před 3 lety +3

    Thank you for the lecture, very clear! Just one question, how does the Bayesian Optimisation already have a mapped surface?

    • @kilianweinberger698
      @kilianweinberger698  Před 3 lety +7

      initially that is just a flat surface, which is an uninformed prior.

  • @mertkurttutan2877
    @mertkurttutan2877 Před 2 lety

    Question: Regarding hyperparameter search via GP, I recall that the earlier steps in hyperparameter search involves determining the scale of hyperparameter. How should we determine the scale? Should we use GP for both scale and minimal value at the same scale. Or, Use grid search to determine scale and then, use GP to find the value of hyperparameter.
    Thanks for both rigorous and enjoyable lectures :)

    • @akshaygrao77
      @akshaygrao77 Před rokem

      U keep running bayes optimization which uses gaussian processes, with more iterations it converges to smaller scales itself

  • @TeoChristopher
    @TeoChristopher Před 4 lety +2

    To Clarify, for 26:19 , for a Gaussian Process, each data point on the X-axis would we a queried test point , the grey region would be the standard deviation and the points that we have not "queried" would be fitted according to its respective determined distribution which it itself would be a Gaussian distribution with its own mean and s.d?

  • @Biesterable
    @Biesterable Před 5 lety +1

    Hm isn't there maybe a way to do low-dimensional egg-search (if it's a manifold there should allways be some main directions) so for the start just make it elipsoid in just one dimension and for comparing distort the room so the elipsoid you're comparing with becomes a globe hm...

  • @ayushmalik7093
    @ayushmalik7093 Před 2 lety +1

    hi Prof
    In Bayesian Optimiser I assume that algorithm for which we are trying to find out best hyper-parameters should be costly enough otherwise it will not make any sense to use GP on top of another algo.

  • @kevinshao9148
    @kevinshao9148 Před 8 měsíci

    9:30, so for my test data, y_test, it has 1) its own variance, 2) n correlations with respect to all observed data y1...yn, then how to determine y_test distribution? how did you get the conclusion at 11:06? Thanks!

  • @chaowang3093
    @chaowang3093 Před 3 lety +1

    This guy is brilliantly funny.

  • @ehfo
    @ehfo Před 5 lety +6

    are the homeworks available for public?

  • @chenwang6684
    @chenwang6684 Před 4 lety +2

    Awesome lecture! One question is are the projects available for public? I have found homeworks but no coding projects.

    • @kilianweinberger698
      @kilianweinberger698  Před 4 lety +3

      Sorry, I cannot post them. The projects are still used at Cornell University, and if they were public someone would certainly post solutions somewhere and spoil all the fun. :-(

  • @imblera6571
    @imblera6571 Před 4 lety +1

    For the hyper parameter search, wouldn't the bayesian optimization approach be more likely to get stuck at a local minimum?

    • @kilianweinberger698
      @kilianweinberger698  Před 4 lety +4

      No, Bayesian optimization is global. The exploration component makes sure that you don’t get stuck.

  • @sarvasvarora
    @sarvasvarora Před 3 lety

    Living for that "YAY" 😂😂

  • @LauraJoana
    @LauraJoana Před 3 lety

    THANKS!

  • @salahghazisalaheldinataban5632

    Seems from your explanation that the covariance matrix is a simple kernel/distance matrix that does not take into account variable importance. (1) Does that cause any issues if there are variables that have no significant prediction value?, (2) Does it mean we have to be careful about variable selection? And (3) is there a way to incorporate feature importance in the kernel?

    • @kilianweinberger698
      @kilianweinberger698  Před 2 lety +1

      For the linear kernel that's not an issue (as your algorithm becomes identical to linear regression where you learn a weight for each dimension), however for non-linear kernels that can indeed be a problem. One common trick is to multiply each feature dimension by a non-negative weight, and also learn these weights as part of the kernel parameters.

  • @andreariboni4242
    @andreariboni4242 Před měsícem

    dead mouse got me

  • @sandeshhegde9143
    @sandeshhegde9143 Před 5 lety +3

    KD Tree starts from czcams.com/video/BzHJ57QCdVo/video.html

  • @thecelavi
    @thecelavi Před 5 lety

    Is it possible to use B/B+ tree instead of simple binary tree?

  • @rajupowers
    @rajupowers Před 4 lety +1

    Important @8:00

  • @vatsan16
    @vatsan16 Před 4 lety

    One thing I would like to ask is, "what's the catch?" The algorithms seems great but where would we not want to use GPR? Is it in situations where we would like to actually know what the function is? Or are there some situations where GPR wont work well?

    • @kilianweinberger698
      @kilianweinberger698  Před 4 lety +4

      Well, I wouldn’t recommend them for data that is very high dimensional (e.g. bag of word vectors, or images in pixel space). Also, when features are sparse splitting along features becomes tedious and too restrictive, as almost all samples always have zeros in all dimensions.

  • @giraffaelll
    @giraffaelll Před 4 lety +2

    He clears his throat a lot

  • @hassanshakeel854
    @hassanshakeel854 Před 5 lety +1

    Are all these lectures dependent on previous ones?