Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem
Vložit
- čas přidán 4. 06. 2024
- In this video, I will be explaining Kolmogorov-Arnold Networks, a new type of network that was presented in the paper "KAN: Kolmogorov-Arnold Networks" by Liu et al.
I will start the video by reviewing Multilayer Perceptrons, to show how the typical Linear layer works in a neural network. I will then introduce the concept of data fitting, which is necessary to understand Bézier Curves and then B-Splines.
Before introducing Kolmogorov-Arnold Networks, I will also explain what is the Universal Approximation Theorem for Neural Networks and its equivalent for Kolmogorov-Arnold Networks called Kolmogorov-Arnold Representation Theorem.
In the final part of the video, I will explain the structure of this new type of network, by deriving its structure step by step from the formula of the Kolmogorov-Arnold Representation Theorem, while comparing it with Multilayer Perceptrons at the same time.
We will also explore some properties of this type of network, for example the easy interpretability and the possibility to perform continual learning.
Paper: arxiv.org/abs/2404.19756
Slides PDF: github.com/hkproj/kan-notes
Chapters
00:00:00 - Introduction
00:01:10 - Multilayer Perceptron
00:11:08 - Introduction to data fitting
00:15:36 - Bézier Curves
00:28:12 - B-Splines
00:40:42 - Universal Approximation Theorem
00:45:10 - Kolmogorov-Arnold Representation Theorem
00:46:17 - Kolmogorov-Arnold Networks
00:51:55 - MLP vs KAN
00:55:20 - Learnable functions
00:58:06 - Parameters count
01:00:44 - Grid extension
01:03:37 - Interpretability
01:10:42 - Continual learning - Věda a technologie
The fact this video is free is incredible
You're welcome 🤗
You're on a mission to make the best and friendliest content to consume deep learning algorithms and I am all in for it.
Your videos are literally the only ones with 1hr+ I would ever watch on CZcams. Keep going mate, extremely high quality content 👏🏽👏🏽
Amazing content, thanks! I'm very excited about the continual learning properties of these networks.
Clearly explained and very valuable content as always Umar. Thank you!
Thanks a lot for making this accessible for people outside the field, for which reading and understanding these papers is quite tough. Thanks to you I'm able to stay slightly more up to date with the crazy quick developments in ML!
Wow this was a super clear an on-point explanation. Thank you, Umar.
The intro of a basic linked up linear layers was so well done and really makes this introduction friendly!
Incredibly clear explanations, the flow of the video is also really smooth. It’s almost like you’re telling a story. Please keep making content!!
Thanks Umar for such a wonderful tutorial! I've been eyeing this paper for a while!
Very clear, well explained, top notch!
Your videos help me (a grad student) really understand difficult, often abstract concepts. Thank you so much... I'll always support your stuff!
Fantastic explanation!
Thanks for including prerequisites
This is life changing, in my opinion. Thank you for the efforts on the videos!
awesome, easy to follow even person dont know anything :)
Thanks for the crystal clear explaination!!
One of the best math videos I’ve watched on CZcams
This is really great! Power to you!!🚀
Ho appena letto la piccola bio del tuo canale, spero di non essere offensivo dicendo che adesso capisco perché il tuo ottimo inglese mi sembrasse comunque molto familiare.
Ad ogni modo ti ringrazio enormemente per il tuo contributo hai spiegato tutta la teoria in un modo, a mio avviso, estremamente chiaro e soprattutto coinvolgente.
Ti prego continua così, di nuovo un enorme grazie e complimenti per il tuo contributo alla scienza
Grazie a te per aver visitato il mio canale! Spero di pubblicare più spesso, anche se per fare contenuti di qualità ci vogliono settimane di studio e preparazione. In ogni caso, spero di rivederti presto! Buon weekend
@@umarjamilai Avevi già guadagnato un iscritto adesso hai guadagnato un fan.
Ahahahahah
Thank you for such great and detailed explanation.
I think KAN will be the catalist of a significant tipping point in science.
I want to apply this to power system grids and replace existing dynamic models with ones made from PMU data using KAN
that is very useful, informative and interesting! Thanks a lot!
Hello Umar, this video is my best birthday gift I have ever received, thanks a lot :)
Extremely clear explanation and content here! Very helpful. I am happy that you came from PoliMI as well :) keep it up!
Hats off, what an awesome video!!!
Thankyou Jamil, what a cool video
High quality explanations.. Thanks.
Sir, you are great..💙💙
I saw this paper on papers with code, and thought to myself I wonder if Umar Jamil will cover this.
Thanks for your effort and videos!
Thank you for making this video!
crazy that it took me an hr video to understand that its the (control points) being trained on the spline graph vs weights with MLPs and CNNs, thank you!
Thank you so so much for this amazing content.
What funny, is that i predicted your next video will be on KAN, after i see you in github.
I WILL WATCH THIS VIDEO, AS I FEEL THIS WILL BE THE FUTURE OF NEUR NETWORK, THANK YOU FOR YOUR WORK AND CONTENT ❤
Your explanations are the best, thank you so much😘🤗
Amazing video! Thanks a lot !
Thank-you so much for explaining the paper, it is so easy to understand now, btw can you also make a hands on video with the kan package developed by mit which is based off pytorch.
Good video, quality content.
Phenomenal! Thank you :)
You’re fantastic, mate.
You are savior, without you mortals like me would be lost in the darkness!!!
awesome explanation
This is awesome!
Great Content !!
Amazingg explanation !
brilliant video!
i loved it sir .
Cant wait to watch this, saved! Will comment again when i actually watch it..😅
Excellent video, thanks! At the end, I _really_ wanted to see an illustration of the relatively "non-local" adaptation of MLP weights. Can that be found somewhere?
bruh so good. Keep it up!
Having a such good teacher is so adorable, i wish i could be your students.
哪里哪里啊,谢谢你的赞成!
@@umarjamilai 太棒了,您还会中文👍
@@seelowst 我就是刚刚从中国来的,在中国主了4年了,现在回欧洲了。
@@umarjamilai 我从没离开过我的城市,我希望像您一样👍
An implementation video will be awesome
thanks Umar. Very nice explanation. Just 2 questions :
1 - Does it mean we can specify different knots per edge?
2 - I am not understanding how the backpropagation will work. Let's say we calculate the gradient from h1. It will update phi 1,1 and phi 1,2 but how the learning process will impact the knots to the desired value?
awesome bro.
amazing
awesome👍
Umar bhai you the great
Hey @Umar, great content as always. Looking forward to a KAN implementation video from scratch. Also I think in 31:01 there is a minor language mistake. I think it will be for using a quadratic Bspline curve rather than quadratic Bezier curve
Please do post more ! please do more videos !
THANK YOU
刚好期盼这个!
期待你的评价😇
❤很好的内容,有考虑做inverse rl的内容吗❤
Great explanation. What app do you use to create slides ?
PowerPoint + a lot a lot a lot a lot a lot of patience.
this video is so amazing!!!!!!!
thank you
Hi, can you please make a video on multimodal LLMs, fine tuning it for custom dataset...
Thanks!
Could you please next explain multi modal llms, techniques like Llava, llava plus, llava next?
I waiting for that day too
Thanks
There are continuous but indiferable points in the spline, right? What are you going to do?
Thanks man. Next xLSTM please.
Valeu!
One thing I didn’t catch: how are the functions tuned? If each function consists of points in space and we move around the points to move the B spline, how do we decide to move the points? Doesn’t seem like backprop would work in the same way.
The same way we move weights for MLPs: we calculate the gradient of the loss function w.r.t the parameters of these learnable functions and change them in the opposite direction of the gradient. This is how you reduce the loss.
We are still doing backpropagation, so nothing changed on that front compared to MLPs.
Thank you for the great video! Can you (or anyone) help understand why you need to introduce the basis functions b(x) in the residual activation functions?
Wow! 🙏
Time to implement it
Sir I have been a huge fan of your videos and have watched all of them . I am currently in my second year BTech and really passionate about learning ml sir if possible can work under you I don’t want any certificate or anything just want to see observe and learn
fwiw I took a MLP solution for MNIST, substituted KAN for the MLP layers and no matter what I did (adding dimensions etc) it couldn't solve it. My intuition is that KANs only work well for approximating linear-ish functions, not irregular, highly discontinuous ones like image classification would need. But perhaps I just screwed it up :D
can you make tutorial video on model like Perplexity that use website live search
Please explain DSPy
Please add a payment option
Your love and support is enough! Have a great weekend!
@@umarjamilaiJust woow
In search of gold i found a diamond
Your explenations are great. I think though, you should take breaks to blow your nose maybe, because you were sniffing a lot. It will make you videos more enjoyable.
Interesting
Thanks!
Thanks