The U-Net (actually) explained in 10 minutes

Sdílet
Vložit
  • čas přidán 4. 05. 2023
  • Want to understand the AI model actually behind Harry Potter by Balenciaga or the infamous image of the Pope in the puffer jacket? Well.. diffusion frameworks such as DALL-E 2, Midjourney, Imagen or Stable Diffusion seem to get a lot of credit, where as the true unsung hero of the story is the underlying U-Net architecture that they all actually use under the hood. Don't get me wrong Diffusion models are awesome but the U-Net is an absolute STAPLE when it comes to computer vision and this video aims to break it down in an easy way. Originally used for image segmentation the U-Net has developed into so much more. Happy watching!
    U-Net paper: arxiv.org/abs/1505.04597
    Many thanks to numerous online resources that helped me create this video.
  • Věda a technologie

Komentáře • 95

  • @salmanzafarsatti1346
    @salmanzafarsatti1346 Před 8 měsíci +30

    man, this video is such a great explainer. I was confused about the use of skip connections since a long a time, but he explained the intuition behind it very nicely.

  • @mayankukani9600
    @mayankukani9600 Před 11 měsíci +21

    Why didn't I find your channel before. Please upload more content, the best content on Deep Learning I have seen.

  • @Anton_Sh.
    @Anton_Sh. Před 8 měsíci +10

    This architecture is one of the truly brilliant ones in the world of deep learning in terms of its simplicity and efficiency.

  • @rippingmyheartwassoeasy
    @rippingmyheartwassoeasy Před 3 měsíci +1

    Thank you for creating this video! Its the best explaination of how a U-Net works that was easy to understand. The visual animation is superbly done!!

  • @pushkar9021
    @pushkar9021 Před 9 měsíci +3

    Continue this series, very helpful

  • @Natstranaut
    @Natstranaut Před 9 měsíci +2

    Oh my god man. Awesome videos. Keep it up, I'm really enjoying them!

  • @thebakareview8009
    @thebakareview8009 Před 2 měsíci

    This channel deserves more subss!! Great content and delivery :)

  • @jsparger
    @jsparger Před 9 měsíci +3

    This was extremely helpful. Thank you

  • @jacobidoko3924
    @jacobidoko3924 Před 4 měsíci

    Yooo...this is quality content right here. Thank you so much for putting this out

  • @user-kv2pi9mf8r
    @user-kv2pi9mf8r Před 6 měsíci +1

    Extremely useful for beginners like me. This is very good

  • @jayhu2296
    @jayhu2296 Před 2 měsíci

    your explained under 10 minutes videos are goated

  • @mridulsehgal7773
    @mridulsehgal7773 Před 20 dny

    The best ever video you can get on Unet explaination

  • @JohnZakaria
    @JohnZakaria Před 3 měsíci

    This was the best unet explanation I have ever seen

  • @transcendingvictor
    @transcendingvictor Před 3 měsíci

    Thank you very much for the time put on doing thisvideo. Interesting and helpful :)

  • @ubanaga
    @ubanaga Před 4 měsíci

    Very nice my friend, this has been most helpful

  • @puekai
    @puekai Před 2 měsíci +8

    Still don't know how it works

  • @hexeldev
    @hexeldev Před 6 měsíci

    This video has been extremely useful. I subbed.

  • @user-ux4st6hh2d
    @user-ux4st6hh2d Před 6 měsíci

    Woooooow! Finally I understood it , really great explanation, thank you

  • @niralpatel5889
    @niralpatel5889 Před měsícem

    This was great, would love a video on diffusion transformers! It looks like they are taking off and replacing U-Net's as the backbone to new diffusion models.

  • @shubhamarle96
    @shubhamarle96 Před měsícem

    thanks for the video, I am trying to use U-net for anomaly detection in time series and your video gave me the idea.

  • @aligreen786
    @aligreen786 Před 4 měsíci

    Very nice explanation. Thanks a lot.

  • @sakethsreeram6981
    @sakethsreeram6981 Před 2 měsíci

    Great presentation!, Easy to understand

  • @coffeestudi0s
    @coffeestudi0s Před 7 měsíci

    Yooo the effort haha. Amazing Video!!!

  • @willlowtree
    @willlowtree Před 8 měsíci

    i love your presentation style

  • @xarisalkiviadis2162
    @xarisalkiviadis2162 Před 2 měsíci

    Amazing video, cleared everything!

  • @pratyushsahoo4948
    @pratyushsahoo4948 Před 2 měsíci

    Absolutely amazing work 🎉

  • @TheHopeOfTruth
    @TheHopeOfTruth Před 2 měsíci

    Thank you for great explanation.On basic level it helps better understand unet

  • @gokulsaisrinivas5312
    @gokulsaisrinivas5312 Před 5 měsíci

    very good explanation of U-NET

  • @amolkumar1538
    @amolkumar1538 Před 9 měsíci +1

    This is Just awesome, great video

  • @ozzafar1982
    @ozzafar1982 Před 11 dny

    great explanation thanks!

  • @mincasurong
    @mincasurong Před 7 dny

    Great summary, Great thanks

  • @LucaBovelli
    @LucaBovelli Před 15 dny

    dude thankssssss i thought this was another one of these things thatll take me 2 hours of youtube to *not* understand, but u saved me

  • @nagham96
    @nagham96 Před 7 měsíci +1

    Thank you that was so helpful and cute! 🤩

  • @vijaykumarb9622
    @vijaykumarb9622 Před 4 měsíci

    Great Explanation.

  • @gregorioosorio16687
    @gregorioosorio16687 Před 8 měsíci

    Thanks for sharing!

  • @LautaHillkirk
    @LautaHillkirk Před měsícem

    nice video, very helpful

  • @user-ef7je7yw7r
    @user-ef7je7yw7r Před měsícem

    wow awesome video and explanation

  • @TechHuntBD
    @TechHuntBD Před 14 dny

    Nice explanation

  • @r.walid2323
    @r.walid2323 Před měsícem

    thanks, good explanation

  • @nikhilchouhan1802
    @nikhilchouhan1802 Před měsícem

    You might not find my comment since the video is too old, but man I just want to thank you for this video. I am a student who has always been interested in computer graphics and related fields like game engines, physical rendering, ray tracing, etc, and jst didnt get the ML/AI hype everyone was on the past 2 years. I only ever managed to study ML basics for 2 weeks before I left it for good. But recently I got in a team where my friends were working on CNN based projects, and that made me learn about many basics about NNs and DL. This explaination for Unet seals the deal for me, and I will strive to work on integrating my two interests into one and hopefully create something I love.

  • @kiraqueenyt5161
    @kiraqueenyt5161 Před 5 měsíci

    such a well made video

  • @usaid3569
    @usaid3569 Před 23 dny

    Great video champ

  • @PAHADIBABAJI
    @PAHADIBABAJI Před 4 měsíci

    Very helpful

  • @Topninja6
    @Topninja6 Před měsícem

    Thank you so much. Now I just need to figure out how to implement this for my project lol

  • @s4lome792
    @s4lome792 Před 20 dny

    Clearly explained. What caused my consfusion in the first place is, in the graphic in the original paper, why does the segmentation mask not have the same dimensionality than the input image?

  • @Grapemaid
    @Grapemaid Před 9 měsíci

    Thanks a lot lot. I understand it!

  • @ny8828
    @ny8828 Před 7 měsíci

    hi its very helpful, how can I reach the PowerPoint of it?

  • @sisami2109
    @sisami2109 Před 6 měsíci

    very nice dude thank you so much

  • @JohnVinchi-bk2dw
    @JohnVinchi-bk2dw Před 9 měsíci +1

    this is extreeeemely helpful,and funny

  • @_the_one_who_asked_
    @_the_one_who_asked_ Před 6 měsíci

    Hi, thank u for this video. can u pls do a video to explain YOLO?

  • @BooleanDisorder
    @BooleanDisorder Před 3 měsíci

    What's the background music called in this video?

  • @dfparker2002
    @dfparker2002 Před 5 měsíci

    This explains inference (I think) by decomposition (dividing) and recomposition (adding) images. Is that accurate?

  • @atifadib
    @atifadib Před 10 dny

    If you want to just use the Decoder how would you do it?

  • @ingenuity8886
    @ingenuity8886 Před měsícem

    Thank you very much bro...

  • @poggiesgw
    @poggiesgw Před 8 měsíci

    good stuff

  • @alirezasaberi6383
    @alirezasaberi6383 Před 11 měsíci

    awesome! can you calso make similar (actually) for Unet++ and Unet3+ please??? thank you so much.

    • @rupert_ai
      @rupert_ai  Před 10 měsíci +2

      Glad you liked it! Its not currently on my list of to-do videos as I like to cover the most popular fundamentals at the moment, but I'll let you know if I get around to it! :)

  • @miguelxplayer9641
    @miguelxplayer9641 Před 2 měsíci

    Dude, you're great. I'm from Portuga 🇵🇹 🟩🟨🟥🟥and I'm learning Machine Learning and Neural Networks. Thank you very much! I loved how you teach. You are intuitive and dynamic. A person is learning a difficult subject and still manages to laugh when watching the videos. I loved. I already subscribed and liked. I'm going to watch more of your videos now. Hugs from Portugal😉

  • @Ngochi-ff7hk
    @Ngochi-ff7hk Před měsícem

    I still don't understand that the output is x2 or x3 or x4.I don't understand why that is the case?

  • @Nerthexx
    @Nerthexx Před 8 měsíci

    If downsampling works by max-pooling, how does upsampling work? In traditional image processing, we would just interpolate image colors, but how does the network apply it's "convolution" in this process? I would understand "deconvolution", but in my mind it wouldn't work here.

  • @abhishekkanojia2816
    @abhishekkanojia2816 Před 10 měsíci +1

    cool videos

  • @ajipboy
    @ajipboy Před 2 měsíci

    bro , immediate subscribe!

  • @yyww4267
    @yyww4267 Před 9 měsíci +2

    Really impressive vedio! And fun work at the end!!!!! LOVE LOVE LOVE!!!

  • @user-xm1zy3pj5k
    @user-xm1zy3pj5k Před 3 měsíci

    Hi. I find the video very interresting. As I'm at the begining, i'm little confused. please, can you also propose a pdf file ? thank yu. Nicely

  • @Englishwithshima1993
    @Englishwithshima1993 Před 4 měsíci

    Perfect

  • @MacProUser99876
    @MacProUser99876 Před 3 měsíci +1

    nice explanation. but why distracting background music?

    • @endlesshybrids
      @endlesshybrids Před 27 dny

      Agreed. Good explanation but I wish people would stop using background music.

  • @1.4142
    @1.4142 Před 8 měsíci

    Dalle 3 is coming to gpt 4 and it can write text!

  • @notrito
    @notrito Před 29 dny

    If anyone wonders how to concatenate the features if they don't match the size... they crop it.

  • @MrMadmaggot
    @MrMadmaggot Před rokem

    Now how they coded it?

    • @rupert_ai
      @rupert_ai  Před 10 měsíci

      Hahaha well there are actually plenty of online code implementations available but I will see if I can get round to a code tutorial on the u-net sooner rather than later!

    • @rishabhbhardwajiitb178
      @rishabhbhardwajiitb178 Před 4 měsíci

      @@rupert_ai can u provide one

  • @007bindass007
    @007bindass007 Před 6 měsíci

    Nice Comment: Useful 👍👍😎😎

  • @timanb2491
    @timanb2491 Před 7 měsíci

    goodgood

  • @linamallek6900
    @linamallek6900 Před 2 měsíci

    nice video, but ideo i hate the music in the background ( so disturbing )

  • @LucaBovelli
    @LucaBovelli Před 15 dny

    bro why did u stop making videos i need you lmao (its a painful lmao.)

  • @luisluiscunha
    @luisluiscunha Před dnem

    You are very funny!

  • @user-mn2bj1hw1vdtfhgh
    @user-mn2bj1hw1vdtfhgh Před měsícem +1

    Me seeing the video at 1.5x 😂😅

  • @jaybrodnax
    @jaybrodnax Před 14 dny

    I feel like this is more a description to experts than an actual explanation of how and why it works.
    Questions I'm left with:
    What is the purpose of downsampling/upsampling (I'm guessing performance?)
    How is segmentation actually done by the u-net?
    How is feature extraction actually done?
    What are max pooling layers?
    What does "channel doubling" mean, and what does it achieve?
    How does the encoder know "these are the pixels where the bike is"?
    Why is it beneficial to connect the encoder features to the decoder features at each step, versus in the last step?
    How does unet achieve anything other than downscaling/upscaling performance efficiency? Where are the actual operations to derive features?
    How is u-net specifically applied for various use cases like diffusion? What does diffusion add or change, for example.

    • @abansalah4677
      @abansalah4677 Před 13 dny +1

      (Disclaimer: I am a beginner, and this is not intended to be a complete answer.)
      You should read about convolutional layers and pooling layers to better understand this video. At any rate:
      A colored image has three channels: R, G, and B. A convolutional layer is specified by some spatial parameters (stride, kernel size, padding) and how many filters are there - the number of filters is the number of channels of the output. You can think of each filter as trying to capture different information. Doubling the channels, therefore, means using double the number of filters when using a stride of 2.
      The segmentation is done just like any ML task - the training data consists of pairs of images and their annotated versions. I think it's often hard to decipher the inner workings of a particular neural networks, and your question can/should be asked in a more general way - how do neural networks learn?

  • @leoyu6400
    @leoyu6400 Před 6 měsíci +2

    hope you can come back to life

    • @c.e1187
      @c.e1187 Před 6 měsíci +1

      Is he dead?

    • @BooleanDisorder
      @BooleanDisorder Před 4 měsíci

      ​@@c.e1187nah, just busy I imagine. He was active on github in December so

    • @truck.-kun.
      @truck.-kun. Před 4 měsíci +1

      ​@@c.e1187maybe yes. Only on CZcams

  • @jonathangallagher3116
    @jonathangallagher3116 Před měsícem

    TIGHT TIGHT TIGHT

  • @jcpouce
    @jcpouce Před 3 měsíci

    music is too distracting... :(

  • @SarraAissaoui-sp3sm
    @SarraAissaoui-sp3sm Před měsícem

    I clicked on thumb down for wasting one minute of my precious time in the intro. Get to the F point !!

  • @websterfenoff8936
    @websterfenoff8936 Před 11 měsíci

    Promo_SM ✅