Lecture 3 "k-nearest neighbors" -Cornell CS4780 SP17

Kilian Weinberger

zhlédnutí 72 170

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 27. 08. 2024

Komentáře • 117

@alexenax1109 Před 5 lety ⁺⁷³
The lecture starts at 2:00!
Amazing explanation on how to pick the right algorithm for your dataset 27:10 otherwise cause bad ML choices!
The lecture starts to approach the k-NN algorithm at 36:00 (before it's about
the training, validation, and test set and about the minimazing
the expected error).
@ehfo Před 5 lety ⁺⁶⁴
your lectures aren't boring at all!!!
@sachinpaul2111 Před 3 lety ⁺¹⁵
“When you watch these from bed , they get boring”: sorry Professor, I’m rewatching this class the fifth time and it’s NEVER bored me
Every time I rewatch , I get new appreciation for a new subtlety of the things you say.
It’s gotten to the point that I kinda imitate you when I’m interviewing with companies. In my interviews, it kinda takes the pressure off when I just think of it as your class and me explaining what’s taught in class
@kilianweinberger698 Před 3 lety ⁺⁷
Haha, thanks, and good luck with your interviews!
@omalve9454 Před rokem ⁺⁴
I raise my hand unconciously when you say "Raise your hand if you understood.". Best lectures ever!
@jachawkvr Před 4 lety ⁺¹²
I took a grad-level class in machine learning and got an A, but only now do I realize how crappy my professor was and how little I actually understood. I am really glad I am able to view these lectures for free. Thank you, Dr.Weinberger!
@RENGcast Před 4 lety ⁺²³
you sir, are GOAT of ML
@gurdeeepsinghs Před 3 lety ⁺³
at 39 minutes Prof. Weinberger said "raise your hand if that makes sense" I actually did !! super high quality content here. That's the level of engagement being created across the world. Respect from India !!
@sekfook97 Před 3 lety ⁺⁵
This lecturer has tremendous charisma!
@abimaeldominguez4126 Před 4 lety ⁺¹⁶
These videos help people from other countries which for some reason can't have access to a get a degree in machine learning. , ...In my case now I know exactly why I should not split the data randomly with the datasets that I use at my work, thanks so much.
@juliocardenas4485 Před 2 lety
Absolutely!!
I have shared summaries of these lectures translated to Spanish. I live in the US but grew up in México
@meenakshisarkar7529 Před 4 lety ⁺¹⁵
Welcome to 2020 where your entire college semester is done from your bedroom. :D
@randajer Před 7 měsíci
Want to get started on some machine learning studying and this is great! Easy to watch while performing menial tasks at work and I can review anything I have questions on at home. Having the notes available to read from ahead of time and then look at during and after the video is tremendous for understanding, thank you very much for providing everyone with such a great font of knowledge.
@jorgebetancourt2610 Před rokem
Professor Weinberger,
I have taken two graduate-level courses in ML, and I believed I had an understanding until I started your course at eCornell. Man, Build your University! I’m speechless about the level of quality of your lectures! Thank you!
@77styl Před 3 lety ⁺¹
I just finished my first semester studying Data Science and today was supposed to be my first day of holidays, yet I have already watched three of the lectures and still going on. I knew how to apply some of the algorithms in R, but knowing the intuition behind them makes it much more clearer. Thank you professor Weinberger for the amazing content.
@TrentTube Před 5 lety ⁺²
Thank you for speaking to the assumptions associated with different models and the chaos of data in the real world.
@nolancao2878 Před 4 lety ⁺⁴
thanks for the lessons and especially providing coursework, notes, and exams
@Oscar-ip3ys Před 2 lety ⁺¹
Thanks for the lecture! The party game example is really insightful and one that you for sure remember in the future. I also appreciate the jokes a lot, they make the lectures highly engaging!
@vieacademy6235 Před 4 lety ⁺¹
Many thanks for the systematic presentation of ML. You make it so easy to follow the subject.
@Karim-nq1be Před 6 měsíci
I was looking for an answer that was quite technical in another video but I got hooked. Thank you so much for providing such great knowledge.
@pranavhegde98 Před rokem
This series of lectures brought back my love of learning
@luq2337 Před 3 lety ⁺¹
My uncle recommended me this channel. Very very very great class!!!
@anmolagarwal999 Před rokem
Excellent intuition on why validation sets are needed: 13:20
@habeebijaz5907 Před 3 měsíci
He is Hermann Minkowski and was Einstein's teacher. Minkowski metric is the metric of flat space time and forms the backbone of special relativity. The ideas developed by Minkowski were later extended by Einstein to develop the theory of general relativity.
@satviktripathi6601 Před 3 lety ⁺²
one of my favorite lectures on ML
@rachel2046 Před 3 lety
I honestly have more respect for Cornell because of Professor Weinberger's lectures.
@vaibhavgupta550 Před 5 lety ⁺³
Amazing lectures sir.. Loved them.. This was just the thing I was looking for and not able to find earlier.
@michaelmellinger2324 Před 2 lety
2:00 Lecture begins - Recap of last lecture
7:50 Can’t split train/test any way we want
11:35 Very often people split train/validation/test. Take best on validation set
24:20 Question for class. as n goes to infinity…
26:15 Weak law of Large Numbers. The average of a random variable becomes the expected value in the limit
27:30 How to find the hypothesis class H. The Party Game.
36:00 k-Nearest Neighbors
41:45 Only as good as its distance metric
@alexenax1109 Před 5 lety ⁺⁶
Thanks from Italy!!!
@maliozers Před 4 lety ⁺¹¹
Who has access to attend this class and prefer to watch online, really ???
@janbolmer4965 Před 5 lety ⁺¹
Thanks for uploading these Lectures!
@SumitSharma-pu6yi Před 2 lety ⁺¹
Hello Dr Kilian,
Greetings from India !
I loved your videos. Could you please take up some modules/lectures specialized in deep learning. Will go binge watching on that too. 😃
Best,
Sumit
@homeroni Před 4 lety ⁺²
"Choosing between your mama and papa or something, what are you gonna do? I like them both."
@minhtamnguyen4842 Před 4 lety ⁺²
just so brilliant
@marcogelsomini7655 Před 2 lety
He is the best
@waihan6772 Před 7 měsíci
great lecture!
@ChandraveshChaudhari Před 3 lety ⁺¹
Guys where is video lecture for 1-NN Convergence Proof
Cover and Hart 1967[1]: As n→∞, the 1-NN error is no more than twice the error of the Bayes Optimal classifier.
@nuoalei1626 Před 3 lety
I want that too.
@KulvinderSingh-pm7cr Před 5 lety ⁺²
Thanks professor !!!
@manjulbalayar9704 Před rokem
Hello Prof Weinberger, I am really enjoying your lectures a lot. Wish I was there in-person in this Fall or next Spring. I was wondering if us viewers online could have access to some older homeworks or assignments for practice. That would be the best! Thanks!
@muratcan__22 Před 5 lety ⁺²
perfect courses sir, thanks.
@doyourealise Před 2 lety
hello sir :) How are you? Hope you are doing well. This si 2022 and nothing can beat your ml lectures. Watching it again :)
@abhinav9561 Před 4 lety ⁺¹
KNN starts at 35:57
@WellItsNotTough Před 3 měsíci
We have a quiz question in lecture notes : "How does k affect the classifier? What happens if k = n? What happens if k = 1?"
I do not think it is discussed in lectures. In my opinion, k is the only hyperparameter in this algorithm. For k = n, we are taking mode of the entire dataset labels as the output for test point, where as for k =1 , it will be assigned label that of the closest nearest neighbor.
I have a doubt here, as we are using distance metric, what if we have 2 points(for simplicity) that are at equal distance to test point and have different labels. What happens in that case for k = 1? Similarly, for k = n, if we have equal proportion of binary class labels, how does mode works in that case?
@kilianweinberger698 Před měsícem
Yes, for k=n it is the mode and k=1 is the nearest neighbor. If the label assignment is a draw (e.g. two points are equidistant) a common option is break ties randomly.
@WellItsNotTough Před měsícem
@@kilianweinberger698 Thank you for the answer Prof. Weinberger and for this amazing series as well.!
@soulwreckedyouth877 Před 4 lety ⁺²
Thanks from Germany
@maddai1764 Před 5 lety ⁺⁴
Thx a lot for this nice course. i think at 48:06 it's just 32 and not 32 to the power of 32. Am i missing something Dear @kilian
@kilianweinberger698 Před 5 lety ⁺²
yep, you are right. well spotted :-)
@VIVEKKUMAR-kv8si Před rokem
Was there a paper on the medical problem with only 11 samples? I was doing a sample study for small sample size problems and was curious what sort of algorithm were used over such a small sized dataset.
@geethasaikrishna8286 Před 4 lety
Once again thanks Prof.Kilian Weinberger for the amazing lecture, one question in the lecture notes:
In the 1-NN Convergence Proof section it is mentioned as, "Bad news: We are cursed!!" & the convergence proof is for n tends to infinity but after watching lecture cursed problem occurs when dimensions(d) tends to infinity/large . So did I misinterpret the statement of cursed when n tends to infinity
@bharatbajoria Před 3 lety ⁺¹
knn starts at 36:02
@ting-yuhsu4229 Před 3 lety ⁺¹
YOU ARE AWESOME!
@adityabhardwaj408 Před 4 lety ⁺¹
This is a great way of letting seekers study. However, is there a way to add questions raised by students in a link. The recording seems to have a noise which refrains from hearing the questions well. Adding questions will add more value and we will be able to relate our questions to theirs & we will have less doubts!
@harshavardhanasrinivasan3125 Před 2 lety
What is the programming choice for writing the assignments and project
@VIVEKKUMAR-kv8si Před rokem
You said if it's iid data, split it uniformly at random. What should have been the correct approach for the spam filter case then? Is it iid? I think not since some mails might be similar to others. Thank you.
@kilianweinberger698 Před rokem
You have to split by time. Let's say you have 4 weeks worth of data, put the first 3 weeks into training and the last week into validation. This way you simulate the real application case, namely that you train on past data to predict the labels of future data.
@yuniyunhaf5767 Před 5 lety ⁺¹
this is amazing, thank u sir
@subhanali4535 Před 5 lety ⁺⁶
Sir, I wanna learn Deep learning, can i skip the rest of classes, I watched first 3 classes, guide me please
@kilianweinberger698 Před 5 lety ⁺¹²
hmm, you may need to be patient. I would recommend you understand logistic regression and gradient descent. If you cannot wait, skip after that, but you are missing out on some important concepts.
@subhanali4535 Před 5 lety
@@kilianweinberger698 Thank you so much Sir
@abhinavmishra9401 Před 3 lety
@Kilian Weinberger I have the same situation. But, I can go farther than gradient descent. How far do your recommend before jumping to Deep Learning so that the Loss in understanding DL is minimized?
@gregmakov2680 Před 2 lety
the reason that "most people do not do right actually" is not by themself, but the gap btw theory model and practical situations is not described clearly in almost all of lectures in all classes in the world!!!! this gap makes students confused heavy super a lot including me :D:D:DD:D
@KOSem-ke9jn Před 5 lety ⁺¹
Hi Professor, thanks for making these videos publicly available. In your formalisation of the algorithm you define a test point as x (presumably a vector), but in your specification of the conditions for points excluded from the k-NN you introduce y’ and y’’ which, to me, either seem redundant if x' and x'' are vectors or have not been consistently applied if a point is now a tuple (x,y) in which case the distance function should be applied to 2 tuples. Am I missing something?
@bluejimmy168 Před 5 lety
I also dont understand that part. At 39:45 he uses (x',y'), Im not sure if he meant ordered pair or two vectors named x', y'. Is there a difference between a vector and a tuple?
@KOSem-ke9jn Před 5 lety
@@bluejimmy168 Hi, yes the notation is a bit confusing in my opinion. I think there is a technical difference between a vector and a tuple; what I meant above was whether x represents the entire vector object or it represents a value in one co-ordinate in a 2 co-ordinate vector representation which I call a tuple - an ordered pair is a tuple, I think.
@kilianweinberger698 Před 5 lety ⁺¹
sorry, yes, I was a little sloppy there. :-/ I hope you can figure it out from the context.
@KOSem-ke9jn Před 5 lety
@@kilianweinberger698 Yes it's clear - just wanted to confirm that I hadn't missed anything. Your lectures are lucid on the whole. Many thanks for sharing.
@mohamedanwar3867 Před rokem
Thank you sir
@Dendus90 Před 5 lety ⁺²
Dear Profesor,
This is the best ML lecture I've ever seen. Are you going to provide more that kind of materials?
PS. Are you looking for any postdocs? ;)
@kilianweinberger698 Před 5 lety
Thanks! Unfortunately not at the moment.
@whyitdoesmatter2814 Před 4 lety ⁺¹
Thanks a lot for your enthusiasm! Coming back to the discussion you had early on concerning splitting the datasets into training, cross validation and test set...My understanding is that for a given dataset D with m values, the first. Step is to train the algorithm on the training set to obtain a parameter, evaluated each parameter on the cross validation one and pick up the smallest one and train the lowest one on a new training set (training and cross validation set) and finally, test it on the test set. Is that correct? Also, concerning the knn algorithm, do you obtain the k parameter on the training or the cross validation set? I am a bit confused. Best regard, Axel from Norway.
@kilianweinberger698 Před 4 lety
Yes, if by “smallest one” you mean the one that leads to the smallest error. For kNN you can even compute the leave-one-out error i.e. you go through each training sample, pretend it was a test sample, and check if you were to classify it correctly with k=1,3,5,7,..,K.
After you have done this for the whole set, you pick the k that lead to the fewest misclassifications (and in case of a tie the smallest k). Hope this helps.
@anmolmonga1933 Před 4 lety
@@kilianweinberger698 Can you do hyperparameter tuning on the training-validation test for multiple algorithms like SVM and Random Forest and then compare results on the test set or the comparing the output of multiple model should also be done on the training-validation set. If you are reproducing it for a paper.
@kevinchittilapilly8221 Před 4 lety ⁺¹
Hi Sir. Can you please guide me as to where I should study the maths required for ML. I did a few courses but it only covered basic calculas and stuff. I had no clue about the weak law of large numbers u talked about at 26:50. Please help
@kilianweinberger698 Před 4 lety ⁺¹
Maybe check out Khan Academy www.khanacademy.org/
It is pretty good.
@kevinchittilapilly8221 Před 4 lety
Thanks a lot sir
@rytonmoffatanalytica Před 3 lety
@@kilianweinberger698 thank you!!!!!!!!!!!!!!!!
@alexenax1109 Před 5 lety ⁺¹⁰
You are not too fast. In fact, I am watching the playlist with a minimum speed of 1.75 (due my schedule) :D
@bharatbajoria Před 3 lety
Sir, u mentioned about 11 data points on a case @23:00 , how about we try Bootstrapping on it and then find best Hypothesis class and function subsequently?
@prathikshaav9461 Před 4 lety ⁺²
is there link to homework, exam and solutions for the same... it would be helpful
@kilianweinberger698 Před 4 lety ⁺³
Past 4780 exams are here: www.dropbox.com/s/zfr5w5bxxvizmnq/Kilian past Exams.zip?dl=0
Past 4780 Homeworks are here: www.dropbox.com/s/tbxnjzk5w67u0sp/Homeworks.zip?dl=0
@saikumartadi8494 Před 4 lety
@@kilianweinberger698 sir it will be very helpful if you share the assignments too because from your demonstrations i see they are much different than the general ones we get in other colleges and we can learn a lot from them.i learn a lot from your lectures . every video i saw is the best i have watched for that topic
@patrikpersson6059 Před 4 lety
Love your lectures! You briefly mentioned metric learning in regards to finding a good distance function, do you know of any good primers or general reading advice on this topic?
@kilianweinberger698 Před 4 lety ⁺²
Maybe read one of my first papers on Large Margin Nearest Neighbors ( papers.nips.cc/paper/2795-distance-metric-learning-for-large-margin-nearest-neighbor-classification )
@sairajrege3340 Před 4 lety
Is the algorithm only affected by euclidean distance or the number of classified points also matter?
@danielvillarraga2225 Před 4 lety ⁺¹
Professor Kilian, I am coming to cornell to enroll in a Ph.d. on civil engineering this fall. I have watched some of your lectures and find them really engaging. I have some understanding of most of the topics on this course but I would like to take some classes on ML. Would you reccomend me to enroll in this course or any other? is this a grad course?
@kilianweinberger698 Před 4 lety ⁺¹
Welcome to Cornell! This is a graduate course, offered every fall. It’s probably a good choice if you want to learn the basics in ML. It also “unlocks” several more specialized courses.
@danielvillarraga2225 Před 4 lety
@@kilianweinberger698 thank you, professor. I will try to enroll this fall.
@vamsikrishnaj4429 Před 4 lety
@@kilianweinberger698 Is this lecture Series along with implementing these with python libraries is enough . And so i can dive into Deep learning.
Reply please!
@ivanehsan2683 Před 2 lety
Is the D(validation) can be also define as a beta test for the h(x) ?
@hello-pd7tc Před 4 lety ⁺¹
Day 3 ✅
@vivekmittal2290 Před 5 lety ⁺¹
Sir, Where I can find the project files.
@hrushikeshvaidya9466 Před 4 lety
Just to make sure, the x and z in the distance function (at 42:50) are the rth dimensions of the position vectors of the two points being considered, right?
@kilianweinberger698 Před 4 lety ⁺¹
x and z are the two vectors and [x]_r is the r-th dimension of vector x. Hope this helps.
@hrushikeshvaidya9466 Před 4 lety
@@kilianweinberger698 Oh, I get it now. Thanks for the clarification, professor!
I look forward to coming to Cornell this fall
@adiflorense1477 Před 3 lety
14:46 What is the difference between a validation dataset and a testing dataset? I think they are the same
@kishorekhaturia7066 Před 3 lety
No, validation set is part of training set to build the model, and test set is used to analyse how well your model generalize,
@jasongomez6783 Před 4 lety
Now in 2020 all classes are online :(. I am an undergrad and I want to learn about machine learning
@semrana1986 Před 4 lety
After a certain time the students were trying to buy some more time by stalling the Professor from moving on... been there, done that
@xJoeKing Před 5 lety
I'm watching this lecture at home...
@azunia4 Před 5 lety
Coool!
@vikramm4967 Před 3 lety
Is it possible to get the questions of test?
@antokay5530 Před 4 lety
Professor, in regards to your spam classifier example, instead of splitting train and test data by time, what if you eliminated all duplicate emails prior to splitting and training? would that work in this case? thank you and thanks for posting these!
@kilianweinberger698 Před 4 lety ⁺¹
The problem is that there may be new spam types that appear. E.g. imagine on Saturday spammers suddenly start sending out "lottery spam". Even if the emails are not identical, your spam filter would pick up on the word "lottery" as very spammy - but this is unrealistic, as in the real world you wouldn't have seen any such spam before. Hope this makes sense.
@subhasdh2446 Před 2 lety
My normal speed for most youtube lectures is 1.5X and sometimes 1.75X. I think you're speaking a bit fast cause 1.5X sounds way too fast and I had to switch to 1.25.
@haroldkumarnaik9971 Před 2 lety
is this under grad or grad lectures
@kilianweinberger698 Před 2 lety
Mostly undergraduates, but cross listed for graduates.
@lidor5938 Před 3 lety ⁺¹
Someone give this man some water
@gregmakov2680 Před 2 lety
hahaah, 60 nam ML :D:D
@juniormax2587 Před 4 lety
ahaaan.😅
@Gg-kw9ql Před 6 měsíci
hi nice
@gregmakov2680 Před 2 lety
ahhaha, ai noi ong la hoc o nha chan lam :D:D:D len loi de bi nhoi so ha :D:D dau co ngu :D:D:D:D
@prattzencodes7221 Před 4 lety ⁺¹
Minkowski? Is it really Minkowski? I wonder where he's from.. Russia? :P :P :P :P
@jianliang6124 Před 3 lety ⁺¹
who said Germans ain't funny lmao
@ZombieLincoln666 Před 5 lety
awful lectures, very unclear.

Další v pořadí

Automatické přehrávání

Lecture 4 "Curse of Dimensionality / Perceptron" -Cornell CS4780 SP17