What do filters of Convolution Neural Network learn?

Sdílet
Vložit
  • čas přidán 9. 07. 2024
  • What do Convolution Neural Network filters really learn? Are they human interpretable?
    Please subscribe to keep me alive: czcams.com/users/CodeEmporiu...
    SPONSOR
    Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite. Love it! www.kite.com/get-kite/?...
    TIMESTAMPS
    0:00 - Personal Note
    0:29 - Introduction
    2:50 - Pass 1: How do Humans classify Images?
    3:57 - Pass 2: How do networks classify Images?
    6:55 - Bilinear Interpolation
    9:00 - Activation Function (the mask)
    10:25 - Intersection over Union (IoU)
    11:20 - Interesting findings from main paper
    REFERENCES
    [1] The main paper that talks about how object detectors come about from image classifiers: arxiv.org/abs/2009.05041
    [2] More on upsampling techniques like Bilinear interpolation and how it compares to other techniques: www.quora.com/What-is-the-dif...
    [3] Resizing images with bilinear interpolation @Computerphile’s video: • Resizing Images - Comp...
    [4] Places365 dataset: places2.csail.mit.edu/explore....

Komentáře • 52

  • @fakhermokadem11
    @fakhermokadem11 Před 3 lety +8

    That jump from the upscaled mask to the IoU and concluding that that last filter is a snow detector is a huge leap of faith right there.

  • @HaiderAli-hp6tl
    @HaiderAli-hp6tl Před rokem +6

    This channel is such a blessing for students to learn and ML engineers to revise their concepts.
    I always watch your videos before interviews.
    Keep up the good work man, you're awesome.

    • @CodeEmporium
      @CodeEmporium  Před rokem

      Thanks so much! Real kind works here :) I try

  • @WhatsAI
    @WhatsAI Před 3 lety +3

    Hey!
    I just wanted to say that I love your videos. I just recently discovered your channel and I am SO glad you are starting to post again! I won't be missing any videos for sure, keep up the good work!
    P.s. thank you for inspiring me to create more videos as well, watching your channel will definitely help me improve as an AI youtuber as well!

    • @CodeEmporium
      @CodeEmporium  Před rokem +1

      I don’t know how this comment slipped past me. Apologies. But thanks so much! I think you’re doing a wonderful job too (subscribed :) )

  • @namanjaswani8925
    @namanjaswani8925 Před 3 lety +1

    Your videos are so good man! Just so good!!
    Please keep uploading more and more such videos

  • @arihantbaid2936
    @arihantbaid2936 Před 3 lety +3

    Your videos make me want to learn more and more about Machine Learning. Thank you :)

  • @riddhimanmoulick3407
    @riddhimanmoulick3407 Před rokem +1

    Great explanation! Loved your method of multi-pass teaching, breaking down the concept one abstraction layer at a time. Please make similar explanations for some research papers too. Thank you!

  • @alexchebanny6896
    @alexchebanny6896 Před 3 lety

    Thank you! This helped a lot! Be safe too! :)

  • @AmitChaudhary-qx5mc
    @AmitChaudhary-qx5mc Před 3 lety

    Thank you so much for such a good explanation please make more such videos.

  • @datascience9425
    @datascience9425 Před 2 lety

    Man ur explanation is awesome 👏

  • @muhammadfaizan9909
    @muhammadfaizan9909 Před 11 měsíci

    Can we use apply sigmoid AF on upscale feature map to get mask?

  • @ziaurrahmanutube
    @ziaurrahmanutube Před 3 lety

    great explanations and visuals as always

  • @CTT36544
    @CTT36544 Před 3 lety

    What app did you use to make the slides and the video? Thanks

  • @muhammadfaizan9909
    @muhammadfaizan9909 Před 11 měsíci

    You talked about single feature map, for all feature maps can we add feature maps depth wise then check where overall filters have focused?

  • @rajatkulkarni7670
    @rajatkulkarni7670 Před 3 lety

    Great video...nice explanation👍

  • @snehotoshbanerjee1938
    @snehotoshbanerjee1938 Před 2 lety

    Masterpiece!

  • @leo-phiponacci
    @leo-phiponacci Před 11 měsíci

    So we can use these filters in segmentation tasks?

  • @areejalokaili8471
    @areejalokaili8471 Před 3 lety +6

    Great video thanks. Just one typo spotted at 8:17, should be *200 instead of 300. But not a big typo. Great video again, thanks

  • @hanikhan6921
    @hanikhan6921 Před rokem

    Thank god I came across ur CZcams video 🌻.

    • @CodeEmporium
      @CodeEmporium  Před rokem

      Thanks so much for coming across the video haha

  • @willd1mindmind639
    @willd1mindmind639 Před 3 lety +1

    The inherent problem is that in computer image processing everything is pixel based. Convolution routines based on a grid of pixels of a certain size are first found in classic image processing algorithms like gaussian blur. It is the fact that images on computers are simply files containing a set of color values or color and intensity values represented as numbers. There is nothing intrinsic in that file format that groups pixels together in any meaningful way, which is why convolutions are used in image processing to investigate each pixel or groups of pixels to try and make sense of what is in the image. And neural networks in using convolutions as part of the algorithm also have the same problem. This is in contrast to the brain where signals from the eye are grouped inherently and explicitly based on how the retina functions plus the way the signal is passed from the retina through the optical nerve into the visual cortex. So because of that each aspect of the visual data is represented as an explicit feature in the neurons and there are no hidden layers or features. Each feature is its own first order entity. So snow is snow, a mountain is a mountain, trees are trees and each entity has "features" like color, texture and shape patterns which themselves are entities. Now in your brain these neural data sets are basically "layers" similar to the way computers composite data to generate a 3d image in video gaming: color layer, texture layer, light/shadow layer, geometry layer = output image. The difference is that in the brain the 'mental image' you see in your brain is the result of the visual cortex putting together the raw neural layers into a coherent image. Because the data is encoded into neural format as soon as light hits the retina, neurons in the brain don't have to deal with stuff associated with data conversion as in taking a file of pixels and determining what is in it. And this is the problem inherent in most image processing using computers as you don't know where the image came from, what kind of camera took it, was it modified using filters or manually, etc. So a lot of the work is in "encoding" the raw pixels into something that can be used meaningfully during training PLUS have to have labeled data representing "ground truth". Brains don't have that issue. Everything received by the eye and encoded into neural signals is ground truth. There is no cyberpunk eyeball hacking or wile e coyote fake scenery going on in that process.

  • @nishanttailor4786
    @nishanttailor4786 Před rokem

    Wonderful Explanation!!

  • @josedavidvillanueva443

    Nice explanation, thanks.

  • @veereshg6600
    @veereshg6600 Před 3 lety

    bro you are a legend!!!!!!!!!!!!!!!

    • @CodeEmporium
      @CodeEmporium  Před 3 lety

      Undisputed facts. Thank you! Mind sharing this around?

  • @1UniverseGames
    @1UniverseGames Před 3 lety

    Nice explanation. I need a help bro, can you please tell me how I can draw such diagram like you did, which software should I use to design such diagram if I want to work on my thesis and publication. It will be really helpful if you could suggest the software name to use to draw such beautiful diagram for a Research work. CNN architecture+ others. Thanks brother

  • @SlipperyBrick89
    @SlipperyBrick89 Před 3 lety

    Hold up. The visual you used for superimposing that 14x14 output. You are saying that we can take that output, upscale it, run it through some activation function, and then superimpose it onto our original input and that will reveal segmented area's of the image that "neuron" or filter has learnt?

    • @SlipperyBrick89
      @SlipperyBrick89 Před 3 lety

      I just watched your video and dude, this is huge, you have absolutely blown my mind. I feel like you've just revealed the magicians trick! I knew about convolutions and how they work, I knew about pooling and everything else ... but even knowing all that I still felt mystified by what the heck the network is learning when convolving over images. This is incredible and I definitely have to try this out.

  • @melissiamillan2405
    @melissiamillan2405 Před rokem

    You’ve just explained how it is and why it is that we all are connected byway of Adam and Eve. 👏💯❤️😇

  • @tyow95
    @tyow95 Před 3 lety

    Thank you, one question, how do you make use of all the 512 filters tho?

    • @SlipperyBrick89
      @SlipperyBrick89 Před 3 lety

      The result of applying a filter is a "feature map". A feature map being some output the filter(s) produced. Using 512 filters produces 512 feature maps (the accumulation of these feature maps is usually referred to as the volume of a convolutional layer, volume being the dimensionality of the data at that particular layer). These are pretty much the meat and potatoes of the network as feature maps are direct representations of what the network has learnt from the input, consider them "gates" (I use that term loosely) that only allow certain parts of the input to pass through the network.

    • @SlipperyBrick89
      @SlipperyBrick89 Před 3 lety

      I should further explain (if you don't know). The filters are the learnable parameters of a CNN. What you are effectively doing when training a CNN is allowing the number of filters per layer of your network to learn their inputs. With enough training you'll have filters that are very good at detecting particular features (maybe cats or dogs, people, etc). This is where some of the best CNN's currently shine as they have complex architectures that have gone through a lot of training over time and have gotten very good at detecting and classifying many things. Hope that helps bring things full circle, I'd recommend to delve in deeper with CNN's as they seem a very sophisticated and useful tool for a variety of AI/ML tasks (even predicting the future through time series data)

  • @WistrelChianti
    @WistrelChianti Před 2 lety

    Also 9:51 explanation... 80% of your pixels are in 0/1 superposition since they are both in the greater and lower 90% at the same time ;)

  • @sgrimm7346
    @sgrimm7346 Před 2 lety

    Nice video....you should have more views. My question is, HOW do the filters learn which features to look for? Example, how does one filter learn vertical lines and another filter learn horizontal lines? And eventually, the higher order filters learning angles and textures? Thank you.

    • @hardikjoshi9765
      @hardikjoshi9765 Před 9 měsíci

      Hi, it depends on the filter you apply. Remember, a filter is just a mathematical function. When you have that filter overlayed on top of your image, its going to compute a product at every pixel and then sum it up. For edge detection or instance, be it vertical or horizontal, you can imagine the image having different pixel values at the background and at the object of interest. If in some way, I can do a product that would completely mask away the background(make those pixels 0) and have some real value at the edge when my filter is sliding through the image, I would get these lines.

  • @Joel-vk3cf
    @Joel-vk3cf Před 2 lety

    Amazing

  • @WistrelChianti
    @WistrelChianti Před 2 lety

    Mismatch between the naration and the equation at 8:18 czcams.com/video/eL80Im8Hq0k/video.html it should be 25% of 200 (equation says 300)

  • @varunreddy695
    @varunreddy695 Před 3 lety +1

    Greattt

  • @PedramNG
    @PedramNG Před 3 lety +1

    SMILE on your thumbnails my man, show us your beautiful teeth!

  • @Bert-lv3jt
    @Bert-lv3jt Před 5 měsíci

    Poorly explained!