What do filters of Convolution Neural Network learn?
Vložit
- čas přidán 9. 07. 2024
- What do Convolution Neural Network filters really learn? Are they human interpretable?
Please subscribe to keep me alive: czcams.com/users/CodeEmporiu...
SPONSOR
Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite. Love it! www.kite.com/get-kite/?...
TIMESTAMPS
0:00 - Personal Note
0:29 - Introduction
2:50 - Pass 1: How do Humans classify Images?
3:57 - Pass 2: How do networks classify Images?
6:55 - Bilinear Interpolation
9:00 - Activation Function (the mask)
10:25 - Intersection over Union (IoU)
11:20 - Interesting findings from main paper
REFERENCES
[1] The main paper that talks about how object detectors come about from image classifiers: arxiv.org/abs/2009.05041
[2] More on upsampling techniques like Bilinear interpolation and how it compares to other techniques: www.quora.com/What-is-the-dif...
[3] Resizing images with bilinear interpolation @Computerphile’s video: • Resizing Images - Comp...
[4] Places365 dataset: places2.csail.mit.edu/explore....
That jump from the upscaled mask to the IoU and concluding that that last filter is a snow detector is a huge leap of faith right there.
This channel is such a blessing for students to learn and ML engineers to revise their concepts.
I always watch your videos before interviews.
Keep up the good work man, you're awesome.
Thanks so much! Real kind works here :) I try
Hey!
I just wanted to say that I love your videos. I just recently discovered your channel and I am SO glad you are starting to post again! I won't be missing any videos for sure, keep up the good work!
P.s. thank you for inspiring me to create more videos as well, watching your channel will definitely help me improve as an AI youtuber as well!
I don’t know how this comment slipped past me. Apologies. But thanks so much! I think you’re doing a wonderful job too (subscribed :) )
Your videos are so good man! Just so good!!
Please keep uploading more and more such videos
Your videos make me want to learn more and more about Machine Learning. Thank you :)
Great explanation! Loved your method of multi-pass teaching, breaking down the concept one abstraction layer at a time. Please make similar explanations for some research papers too. Thank you!
Thanks so much for commenting this!
Thank you! This helped a lot! Be safe too! :)
Thank you so much for such a good explanation please make more such videos.
Man ur explanation is awesome 👏
Can we use apply sigmoid AF on upscale feature map to get mask?
great explanations and visuals as always
Thank you for watching
What app did you use to make the slides and the video? Thanks
You talked about single feature map, for all feature maps can we add feature maps depth wise then check where overall filters have focused?
Great video...nice explanation👍
Masterpiece!
So we can use these filters in segmentation tasks?
Great video thanks. Just one typo spotted at 8:17, should be *200 instead of 300. But not a big typo. Great video again, thanks
Thank god I came across ur CZcams video 🌻.
Thanks so much for coming across the video haha
The inherent problem is that in computer image processing everything is pixel based. Convolution routines based on a grid of pixels of a certain size are first found in classic image processing algorithms like gaussian blur. It is the fact that images on computers are simply files containing a set of color values or color and intensity values represented as numbers. There is nothing intrinsic in that file format that groups pixels together in any meaningful way, which is why convolutions are used in image processing to investigate each pixel or groups of pixels to try and make sense of what is in the image. And neural networks in using convolutions as part of the algorithm also have the same problem. This is in contrast to the brain where signals from the eye are grouped inherently and explicitly based on how the retina functions plus the way the signal is passed from the retina through the optical nerve into the visual cortex. So because of that each aspect of the visual data is represented as an explicit feature in the neurons and there are no hidden layers or features. Each feature is its own first order entity. So snow is snow, a mountain is a mountain, trees are trees and each entity has "features" like color, texture and shape patterns which themselves are entities. Now in your brain these neural data sets are basically "layers" similar to the way computers composite data to generate a 3d image in video gaming: color layer, texture layer, light/shadow layer, geometry layer = output image. The difference is that in the brain the 'mental image' you see in your brain is the result of the visual cortex putting together the raw neural layers into a coherent image. Because the data is encoded into neural format as soon as light hits the retina, neurons in the brain don't have to deal with stuff associated with data conversion as in taking a file of pixels and determining what is in it. And this is the problem inherent in most image processing using computers as you don't know where the image came from, what kind of camera took it, was it modified using filters or manually, etc. So a lot of the work is in "encoding" the raw pixels into something that can be used meaningfully during training PLUS have to have labeled data representing "ground truth". Brains don't have that issue. Everything received by the eye and encoded into neural signals is ground truth. There is no cyberpunk eyeball hacking or wile e coyote fake scenery going on in that process.
Interesting
Wonderful Explanation!!
Thanks a ton for watching!
Nice explanation, thanks.
Thanks so much for watching !
bro you are a legend!!!!!!!!!!!!!!!
Undisputed facts. Thank you! Mind sharing this around?
Nice explanation. I need a help bro, can you please tell me how I can draw such diagram like you did, which software should I use to design such diagram if I want to work on my thesis and publication. It will be really helpful if you could suggest the software name to use to draw such beautiful diagram for a Research work. CNN architecture+ others. Thanks brother
Hold up. The visual you used for superimposing that 14x14 output. You are saying that we can take that output, upscale it, run it through some activation function, and then superimpose it onto our original input and that will reveal segmented area's of the image that "neuron" or filter has learnt?
I just watched your video and dude, this is huge, you have absolutely blown my mind. I feel like you've just revealed the magicians trick! I knew about convolutions and how they work, I knew about pooling and everything else ... but even knowing all that I still felt mystified by what the heck the network is learning when convolving over images. This is incredible and I definitely have to try this out.
You’ve just explained how it is and why it is that we all are connected byway of Adam and Eve. 👏💯❤️😇
Haha thanks a ton for commenting
Thank you, one question, how do you make use of all the 512 filters tho?
The result of applying a filter is a "feature map". A feature map being some output the filter(s) produced. Using 512 filters produces 512 feature maps (the accumulation of these feature maps is usually referred to as the volume of a convolutional layer, volume being the dimensionality of the data at that particular layer). These are pretty much the meat and potatoes of the network as feature maps are direct representations of what the network has learnt from the input, consider them "gates" (I use that term loosely) that only allow certain parts of the input to pass through the network.
I should further explain (if you don't know). The filters are the learnable parameters of a CNN. What you are effectively doing when training a CNN is allowing the number of filters per layer of your network to learn their inputs. With enough training you'll have filters that are very good at detecting particular features (maybe cats or dogs, people, etc). This is where some of the best CNN's currently shine as they have complex architectures that have gone through a lot of training over time and have gotten very good at detecting and classifying many things. Hope that helps bring things full circle, I'd recommend to delve in deeper with CNN's as they seem a very sophisticated and useful tool for a variety of AI/ML tasks (even predicting the future through time series data)
Also 9:51 explanation... 80% of your pixels are in 0/1 superposition since they are both in the greater and lower 90% at the same time ;)
Nice video....you should have more views. My question is, HOW do the filters learn which features to look for? Example, how does one filter learn vertical lines and another filter learn horizontal lines? And eventually, the higher order filters learning angles and textures? Thank you.
Hi, it depends on the filter you apply. Remember, a filter is just a mathematical function. When you have that filter overlayed on top of your image, its going to compute a product at every pixel and then sum it up. For edge detection or instance, be it vertical or horizontal, you can imagine the image having different pixel values at the background and at the object of interest. If in some way, I can do a product that would completely mask away the background(make those pixels 0) and have some real value at the edge when my filter is sliding through the image, I would get these lines.
Amazing
Thanks a lot
Mismatch between the naration and the equation at 8:18 czcams.com/video/eL80Im8Hq0k/video.html it should be 25% of 200 (equation says 300)
Greattt
Thanks!
SMILE on your thumbnails my man, show us your beautiful teeth!
I use colgate. I got nothing to fear
Nice one 👍
Poorly explained!