Jitendra Malik: Computer Vision | Lex Fridman Podcast

Sdílet
Vložit
  • čas přidán 13. 06. 2024
  • Jitendra Malik is a professor at Berkeley and one of the seminal figures in the field of computer vision, the kind before the deep learning revolution, and the kind after. He has been cited over 180,000 times and has mentored many world-class researchers in computer science.
    Support this podcast by supporting our sponsors:
    - BetterHelp: betterhelp.com/lex
    - ExpressVPN at www.expressvpn.com/lexpod
    EPISODE LINKS:
    Jitendra's website: people.eecs.berkeley.edu/~malik/
    Jitendra's wiki: en.wikipedia.org/wiki/Jitendr...
    PODCAST INFO:
    Podcast website:
    lexfridman.com/podcast
    Apple Podcasts:
    apple.co/2lwqZIr
    Spotify:
    spoti.fi/2nEwCF8
    RSS:
    lexfridman.com/feed/podcast/
    Full episodes playlist:
    • Lex Fridman Podcast
    Clips playlist:
    • Lex Fridman Podcast Clips
    OUTLINE:
    0:00 - Introduction
    3:17 - Computer vision is hard
    10:05 - Tesla Autopilot
    21:20 - Human brain vs computers
    23:14 - The general problem of computer vision
    29:09 - Images vs video in computer vision
    37:47 - Benchmarks in computer vision
    40:06 - Active learning
    45:34 - From pixels to semantics
    52:47 - Semantic segmentation
    57:05 - The three R's of computer vision
    1:02:52 - End-to-end learning in computer vision
    1:04:24 - 6 lessons we can learn from children
    1:08:36 - Vision and language
    1:12:30 - Turing test
    1:16:17 - Open problems in computer vision
    1:24:49 - AGI
    1:35:47 - Pick the right problem
    CONNECT:
    - Subscribe to this CZcams channel
    - Twitter: / lexfridman
    - LinkedIn: / lexfridman
    - Facebook: / lexfridmanpage
    - Instagram: / lexfridman
    - Medium: / lexfridman
    - Support on Patreon: / lexfridman
  • Věda a technologie

Komentáře • 100

  • @lexfridman
    @lexfridman  Před 3 lety +63

    I really enjoyed this conversation with Jitendra. Here's the outline:
    0:00 - Introduction
    3:17 - Computer vision is hard
    10:05 - Tesla Autopilot
    21:20 - Human brain vs computers
    23:14 - The general problem of computer vision
    29:09 - Images vs video in computer vision
    37:47 - Benchmarks in computer vision
    40:06 - Active learning
    45:34 - From pixels to semantics
    52:47 - Semantic segmentation
    57:05 - The three R's of computer vision
    1:02:52 - End-to-end learning in computer vision
    1:04:24 - 6 lessons we can learn from children
    1:08:36 - Vision and language
    1:12:30 - Turing test
    1:16:17 - Open problems in computer vision
    1:24:49 - AGI
    1:35:47 - Pick the right problem

    • @a.s.8113
      @a.s.8113 Před 3 lety +4

      It's very informative conversation

    • @Nick_Tag
      @Nick_Tag Před 3 lety

      I just have to come out and say that I’ve been performing some wacky experiments on myself this past year (e.g. wearing those goggles that invert vision, coming up with a way to (scarily) turn my vision off by concentrating in low light, and influencing dreaming/waking states although mostly ad-hoc. I used to go to school with one contact lens to save money but also switch up eye-dominance... any ways from one of my crazy insights into the subconscious vision system I discovered something that may be useful for researchers to advance. Remember when Wolfram was categorised as a plunger?(!) Basically this doesn’t happen with our system because when objects are overlaid ontop of one another aka partially occluded there is an automatic “highlighting” around the selection of object you are focusing one. Well I experienced that and somehow saw that for only a few seconds during a waking state and I GUESS we are subconsciously utilising light polarisation / photon phase in order to distinguish ‘object depth’ ready for higher cortical processing. Another time my brain went crazy when i couldn’t ‘automatically’ guessitimate the distance to a shiny roof chimney/ smoke extractor (e.g. like Wittgensten’s Rabbit-Duck). So I put 1+1 together and say the learning step is related to the phase / polarisation at the object’s edge (and predict you could design an unethical experiment on babys involving highly polished chrome ‘distance illusions’ :-p ... Obviously I sound like an alien abductor sensationalist here but really think the insight from my “highlighting” experience might be useful. There were also shooting ‘pixels’ of a particular shape (not square) but we won’t go there :-p

    • @avnishkumar7315
      @avnishkumar7315 Před 3 lety

      Can you please interview Sergei Belongie, Cornell Tech? He did his PHD under him! Thanks

    • @154jdt
      @154jdt Před 3 lety +1

      Hey Lex, you should have someone on to discuss computational imaging for machine vision within the industrial automation sector.

    • @mackenzieclarkson8322
      @mackenzieclarkson8322 Před 2 měsíci

      "entitification" wouldn't be that complex considering we have stereo vision. Using depth data or stereo images, the problem would be fairly simple, wouldn't you agree?

  • @AldoKoskettimet
    @AldoKoskettimet Před 3 lety +64

    Jitendra is a clear and organized mind in a world of confusion, at least in the academic world

  • @tyfoodsforthought
    @tyfoodsforthought Před 3 lety +26

    One of the best things that can happen to you, is wondering what to watch/listen to, and then a Lexcast comes out at that very moment. I'm excited! Thank you, Lex!!!

  • @Kaget0ra
    @Kaget0ra Před 3 lety +50

    I literally just finished reading one of his papers, Deep Isometric Learning for Visual Recognition, before seeing what's new on my subs hehe.

    • @Kaget0ra
      @Kaget0ra Před 3 lety +6

      @@unalome8538 while I'm still going with coincidence, that comment managed to pique my paranoia.

  • @lucaswood7602
    @lucaswood7602 Před 3 lety +12

    I always learn something with every podcast thanks Lex

  • @JS-zh8dd
    @JS-zh8dd Před 3 lety +6

    You get that smile out of them at the end (and often throughout, of course) almost without fail. Thanks for creating so many enjoyable rides.

  • @nikhilvarmakeetha3917
    @nikhilvarmakeetha3917 Před 3 lety +6

    Awesome Podcast! Loved Prof.Jitendra's take on CV, perception & Cognosis. Absolutely amazing how he transitions from CV to Psychology and other domains. Really gave me a great perspective on child perception & long-form video understanding. Thanks Lex!

  • @punitmehta6864
    @punitmehta6864 Před 3 lety +4

    This was amazing, always love to hear Prof. Jitendra Malik. Could you please schedule a discussion with Jeff Dean as well?

  • @AldoKoskettimet
    @AldoKoskettimet Před 3 lety +7

    thank you very much Lex for sharing such interesting and informative talks with us!

  • @amrendrasingh7140
    @amrendrasingh7140 Před 3 lety +4

    Enjoyed it. Keep up the good work lex

  • @trax9987
    @trax9987 Před 3 lety +2

    Excellent conversation Lex. I feel the podcast has started exploring a lot more territories outside of AI but I am glad you brought it back to computer vision and deep tech. The demarcation between images and video is excellent. This is what I come here for!

  • @danielveinberg7185
    @danielveinberg7185 Před 3 lety +3

    This was a really good one.

  • @sebastianpizarro5407
    @sebastianpizarro5407 Před 3 lety +3

    Awesome thanks Lex!

  • @daqo98
    @daqo98 Před 3 lety +1

    Pretty insightful video. Lex, you are doing a great job spreading out the knowledge for free. That's one of the ways on how you can make a change in the world; bringing us these foundations. Thank you very much! We all owe you a lot.

  • @chandrasekharvadnala6469

    Thanks you Lex Fridman, helps me understanding Computer Vision

  • @armchair8258
    @armchair8258 Před 3 lety +7

    Can u do a podcast with Hofstadter, that would be great

  • @JaskoonerSingh
    @JaskoonerSingh Před 3 lety

    great interview, thanks

  • @carvalhoribeiro
    @carvalhoribeiro Před 2 měsíci

    Another great interview Lex. Thanks for sharing this

  • @adwaitkulkarni3567
    @adwaitkulkarni3567 Před 3 lety +5

    Thanks for the wonderful videos, Lex, lots of love! Can we get Naval Ravikant, Jeffrey Sachs on the Lexcast? That'd break the intellectual part of the net!

  • @jordanjennnings9864
    @jordanjennnings9864 Před 3 lety

    You two are honest working intelligent men. What great conversation very refreshing in knowledge I’m glad to still see people still get good things from dedication and studies. God bless you both Jitendra and Lex

  • @_avr314
    @_avr314 Před 2 měsíci

    Took ML class with him last fall, great professor!

  • @rohscx
    @rohscx Před 3 lety

    Thanks Lex

  • @ZenJenZ
    @ZenJenZ Před 3 lety

    Thanks Lex ❤

  • @danielveinberg7185
    @danielveinberg7185 Před 3 lety

    I really like talks about AGI. Specifically, how to build it.

  • @GustavoTakachiToyota
    @GustavoTakachiToyota Před 3 lety +6

    Please interview Noam Brown! The guy from Libratus and Pluribus

  • @amohar
    @amohar Před 3 lety

    Loved it... !!!

  • @rufuscasey2989
    @rufuscasey2989 Před 3 lety +1

    Love the intros too :)

  • @bradymoritz
    @bradymoritz Před 3 lety +6

    "humans are not fully explainable". Interesting (counter)point.

  • @saikrishnagottipati4573
    @saikrishnagottipati4573 Před 3 lety +36

    "however the heck you spell that" haha

    • @skoto8219
      @skoto8219 Před 3 lety +5

      I guess the unique spelling is the result of the German name Friedman(n) being rendered in Russian as a more phonemic Фридман and then transliterated back into the Latin alphabet?

  • @enio17
    @enio17 Před 3 lety

    Great! Eventually you'll have to interview Linus Torvards and Richard Stallman

  • @nirvana4ol
    @nirvana4ol Před 3 lety +1

    Could you invite Prof. David Mumford. An amazing researcher with deep math background and who later on did vision. He has done very interesting work, everything from Mumford Shah functional, his paper on statistics of images, theory of shapes and patterns.

  • @Oliver.Edward
    @Oliver.Edward Před 3 lety

    Any infomation on the Google Quantum Computer??? :):):)

  • @Cederic201
    @Cederic201 Před 3 lety +9

    Lex, could you invite a Russian scientist Sergei Saveliev for a conversation about Human brain?

    • @RahulKumar-ng2gh
      @RahulKumar-ng2gh Před 3 lety

      and Grigori Perelman, though he is too reclusive to give an interview.

    • @ConstantineSad
      @ConstantineSad Před 3 lety

      Савельев шарлатан и несёт антинаучный бред. Ни в коем случае

  • @TheManonCanon
    @TheManonCanon Před rokem

    The way Dr. Malik gives space for the social issues around AI when it’s his field is so reassuring. I agree, we should be aware of the risks of AI as proactively as possible. We need to have more diversity in tech and user testing pools to develop safer and less biased algorithms and AI!

  • @Brazen5tudios
    @Brazen5tudios Před 3 lety

    Way to schedule the greats!
    There are probably a few interesting, as yet undiscovered, minds out of the UC San Diego Cognitive Science dept that could be worth interviewing.

  • @arpitasahoo128
    @arpitasahoo128 Před 3 lety +1

    Off topic, but if you can, please bring Terence Tao to your podcast!

  • @jamalparra5879
    @jamalparra5879 Před 3 lety

    I would love to be able to listen to these podcasts in the background while working, but it's impossible, these conversations with world-class experts in their fields demand full attention and concentration.

  • @bhrvindergill9281
    @bhrvindergill9281 Před 3 lety

    Did Jitendra say " tabular ...learning in a supervised world". can someone repeat the term/phrase again as i could not understand the full term he used and neither can i find it on google. Thank you

    • @agdsam
      @agdsam Před 3 lety

      Tabula rasa - blank slate

  • @jamescurrie01
    @jamescurrie01 Před 3 lety +2

    Hi Lex, you should get Sam Harris on the show. Hes extremely well spoken on the topic of AI and even Elon says hes great and understands the future of AI really well. He would be a really popular guest.

  • @patf6957
    @patf6957 Před rokem

    Our son's taking a class from Maliik now at Cal and I wanted to hear Malik speaking for himself. On 3D, I wondered: how has he intersected with research done on the brain's grid cells? And googled. Interesting. Consulted but not necessarily done research directly on the subject. Jeff Hawkins (Palm, Numenta) considers that grid cells may be fundamental to human intellect. Wonder how Hawkins and Numenta are considered by academia? There was an explosion in the capabilities of the homo sapien brain around 70,000 mya and given how quickly the transformation occurred could that have been based on, literally, one physical change?

  • @nishantravinuthala511
    @nishantravinuthala511 Před 3 lety +1

    This conversation is probably your best.

  • @almightyblackdollar3669

    I think we should use tiktok, CZcams, Snapchat...data for visualizing A.I before getting it on the road 17:50

  • @swarajshinde3950
    @swarajshinde3950 Před 3 lety

    Next Podcast with Andrej Karpathy please.

  • @gregor-samsa
    @gregor-samsa Před 3 lety +2

    Lex, go high! Try to interview Noam Chomsky on AI (again). Sad that Josef Weizenbaum (Eliza) is not around any more.

    • @sangramjitchakraborty7845
      @sangramjitchakraborty7845 Před 3 lety

      He already has a podcast episode with Noam Chomsky.

    • @gregor-samsa
      @gregor-samsa Před 3 lety

      @@sangramjitchakraborty7845 thx here it is czcams.com/video/cMscNuSUy0I/video.html

  • @PullingEnterprises
    @PullingEnterprises Před 3 lety +1

    Is it possible to create a network that can understand general motion much like how a human understands on a coarse level, inertia? Consider a Tesla that had a computer-aided model of the world outside the car that simulated inertial moments for objects, this would be another step in the right direction towards a spatial awareness that can accurately map objects and their likely trajectory. Humans can learn what skateboarders do with relatively little data because we understand general motion

  • @mayaoya3562
    @mayaoya3562 Před 3 lety +1

    children do need labels. I spend ages with my children, pointing and naming. It is a normal and vital part of life, especially in the first 3 years. It is about 90% of conversation, labeling stuff.

  • @ka9dgx
    @ka9dgx Před 3 lety

    Parents and child development specialists could help if you really want to go the child like learning route...
    Start with eyes that can move, throw away most of the pixels to emulate the fovea and peripheral vision... set up a learning network that just tries to predict the next few frames... then work backwards from that, and add layers... until you get to where you can emulate neck muscles and the ability to look around... then you can add a throat, detecting "food" (nerf balls with rfid chips in them) that reward eating... but not too much....
    No simulation, all reality... it would take time, but when you're done... you've got a robust model that works in the real world.

  • @willd1mindmind639
    @willd1mindmind639 Před 3 lety

    The problem in computer vision is that the concept of neural networks as we know them represent an abstraction in software for representing how to capture and propagate signals that represent a logical state or "feature". Most of the work in AI is how to build these networks in based on data inputs and "machine learning" models. But in biology, the creation of these signals and propagation of information is hard coded into the retina and optical nerve. It is hard wired based on evolution and biological function and not really "learned". The visual cortex in the brain is not "building" these networks from scratch. It is making sense of the networks of signals provided to it from the "real world" by networks of signals provided to it by the optical nerve. That is where the learning takes place in terms of how to make associations between sets of signals propagated via networks of neurons firing in a certain pattern based on physical input. And then the other parts of the brain can capture and retain certain properties of those signals and networks of signals in memory and recombine and manipulate those as part of 'imagining', 'recall' and basic brain function.
    The issue for computer vision today is that most of the work is in Re-encoding buckets of RGB values that are stored on a computer disk somewhere based on statistical math. The RGB values represent the initial encoding layer from the camera sensor and that form of encoding is 100% lossy in terms of providing networks of meaningful signal data to process. So using statistical mathematical operations will then reproduce the network of signals that propagate "bottom up" through a network to produce an overall statistical result signal at the top level. And that "neural network" represents a statistical encoding function with error correction to filter out "fake" data during training. But that is not how the human brain understands or encodes visual signals. There is no monolithic network representing things in a brain, as opposed to collections of groups of smaller networks representing signals captured from the real world with no need to error correct or filter out "fakes". For example, a tree would be a collection of signals representing shapes of leaves, shapes of branches, shapes of the trunk, color of the branches, color of the leaves, plus textures and light and shadow information. From that, the brain can learn what a tree is based on the encoding layers of the optical nerve that provided those inputs to the brain. Each of those various layers of encoding are first class entities and distinct networks into them selves. And the brain can learn to associate them together in a loose coupling of individual lower level entities (networks of signals). Because of that, the brain can still see a tree, even if it only sees the leaves an branches and not a trunk because logically those features (shape/color/texture of a leaf or branches) are memorized as part of a larger collection of features called a "tree".
    TLDR; It is a signal encoding problem where neural networks are simply the first pass of encoding data, the "intelligence" happens on another layer where meaning and understanding is produced based on input. The job of the encoding layer is to consistently produce the same range of signal values (groups of shallow network signals) based on the provided input, much like a computer logic circuit. You cannot put all of that into one model and expect the same kind of results.

  • @fintech1378
    @fintech1378 Před 10 měsíci

    if he gives way more technical answers it will actually be better, although more people will not understand, but thats why we are here

  • @toufisaliba2806
    @toufisaliba2806 Před 3 lety +1

    @1:29:29 golden words! the problems of safety, biases, risks are today. The bigger problems NOBODY is paying enough attention to, is the governance of AI that will be in your brain Lex and Jitendra. And your children. I am still surprised that no one is giving it enough attention. The video with Ben he mentioned some of those problems and how we are building resolutions for those. but we need an army. . Point is, I am NOT being negative here, quiet the opposite, AI can be fabulous for humanity but until then the governance is not being taken seriously. Attack from within... Pick a single entity you are comfortable with in taking over your "free will"... Or what we are proposing, "Atonomous Decentralized Governance" so no one can repurpose your extended brain/ AI without your control. . Today your phone is your extended brain. Tomorrow will be a lot closer to your brain.

  • @rahuldeora1120
    @rahuldeora1120 Před 3 lety

    Where is Demis Hassabis?

  • @henryvanderspuy3632
    @henryvanderspuy3632 Před 3 lety

    wow

  • @chamsabressitoure521
    @chamsabressitoure521 Před 3 lety +3

    Please bring Peter Thiel as a guest Lex.

  • @kirstengreed3973
    @kirstengreed3973 Před 3 lety

    Humans are black boxes too. Good point.

  • @ironassbrown
    @ironassbrown Před 3 lety

    wasn't it Donald Rumsfeld that said " there are known knowns, and unknown unknowns"

    • @kennyg1358
      @kennyg1358 Před 3 lety +1

      And known unknowns and unknown knowns.

  • @abhirishi6200
    @abhirishi6200 Před 3 lety

    Nice yo

  • @RPHelpingHand
    @RPHelpingHand Před 3 lety

    How important is prediction, if the computer can observe and react at super human speeds? Sure prediction might save you from a .0001 situation but I think FSD will be good enough and save more lives than our current unsafe driving conditions. Perfection should not kill progress.

  • @TheManonCanon
    @TheManonCanon Před rokem

    Do I think we’ll be able to understand it ever? Sure. In the next 20 years or so? No 🤣🤣 I died at this message and it’s so true it’s like we don’t even know how much we don’t know lmaoo

    • @TheManonCanon
      @TheManonCanon Před rokem

      Excellent podcast btw I will be referencing it often!

  • @TheUmangyadav
    @TheUmangyadav Před 3 lety

    Fix the lighting. Jitendra shows up in well-lit frame and Lex in the different colored frames. Otherwise content is good.

  • @999nines
    @999nines Před 3 lety

    Lex, I am surprised to hear you expressing so much doubt regarding autonomous driving. Especially after watching your interview with Jim Keller. Also you must have seen the video out there of teslas driving themselves on and off the 280 freeway, passing cars and being passed changing lanes negotiating intersections on the Sandhill Road. There’s also this thing about autonomous driving having to be perfect. I get this impression from people; That they believe autonomous driving can never have a mistake. Must always has to be perfect. If someone gets injured, or killed you have to throw it out. I think the more realistic way to look at it, and the way that Elon looks at it - it’s like this, when autonomous driving is 100 times safer than having humans behind the wheel, that will be a great achievement. Why would you ever have a human behind the wheel after that?

  • @paulgregson88
    @paulgregson88 Před 3 lety +1

    Always ready for a funeral

  • @Loom-works
    @Loom-works Před 3 lety

    Without a Soul a machine cant see a reality. Not because it has no camera but because it only does what its programed to do.

  • @krinniv7898
    @krinniv7898 Před 3 lety

    Was that am jokes?! hahahaha

  • @PhilippeLarcher
    @PhilippeLarcher Před 3 lety

    1:16:00 the blind assistant could be like a virtual Amelie Poulain :D Could be exhausting over time ^^
    czcams.com/video/MOD11gnTKyA/video.html

  • @scottbutcher9093
    @scottbutcher9093 Před 3 lety

    Lex. If our consciousness stops when we die. How would we know we ever existed? I mean us personally not through the people around you. If we are just gone. There is nothing, how are we experiencing this now? When we die we can't remember anything because there is nothing. I am not smart and I don't think I can properly describe what I am asking. Sorry if this makes no sense.

    • @carnap355
      @carnap355 Před 3 lety

      If X dies, X does not know that he ever existed because he lacks consciousness and therefore any possibility to know. What do you mean "how are we experiencing this now"? When is "now"? It would seem that we are not dead now and therefore have our experience.

  • @mikepict9011
    @mikepict9011 Před 3 lety

    I guess the only place left in the universe for it innovation is radiation hardened hardware

  • @nikhilsinghh
    @nikhilsinghh Před 3 lety

    I have a dream to talk to you.

  • @WackieChai
    @WackieChai Před 3 lety

    I posted this comment for Joe that I am certain no-one will ever read.... maybe you will Brother/doctor/strong like Russian LEX FRIDMAN -LOVE U TOO LEX (am I crazy or is this the answer AI will come to eventually?): "Hey Joe, I love you my brother! Think of "DARK ENERGY/MATTER" as simply particles like Photons (force carriers) that only exist at a speed faster than photons and that is why us humans can not "see" it yet. Think of gravity as the only force that can interact with particles at "BOTH SPEEDS" (i.e. photons only exists at about 3 x 10(to the eight) meters per second or less depending on the medium, but "dark" faster than light particles exists at some much faster quantized speed (tachyon)). Think of BLACK HOLES as as tornados acting as particle accelerators taking known force carriers like photons and swirling them up to some, as yet undiscovered, faster-than-light quantized speed where particles become invisible to us yet still interacts with gravity. I have the mathematics to prove it but just like this comment, nobody will ever read it. See Joe, all this "dark matter/energy" bullshit is really very simple. Love what you do bro and THANK YOU JOE ROGAN for the free education you provide my friend!"

  • @ultraderek
    @ultraderek Před 3 lety

    We need to raise robot children.

  • @evanroy4143
    @evanroy4143 Před 3 lety

    My brain needs a neuralink bc it’s so shit and fucked from drugs lul

  • @999nines
    @999nines Před 3 lety

    Auto pilot uses radar and sonar as well as its cameras. I would love to hear Jim Keller or Elon debating this guy about whether or not autonomous driving can be accomplished.

  • @federico-bayarea
    @federico-bayarea Před 3 lety +1

    Awesome interview, Lex, as always!
    Connected to the evolutionary discussion, Jordan Peterson lectures give great insights on the current theories of human and animal minds. It was particularly surprising to me how much action is in the hypothalamus, to the point that an animal can be quite functional with the cortex removed, and just acting based on the hypothalamus. I believe a discussion between you and Jordan would be super entertaining and insightful for the audience.
    Reference: CZcams video "2017 Maps of meaning 05: story and metastory (Part 1)", timestamp 1h 40m 23s.
    m.czcams.com/video/RudKmwzDpNY/video.html&t=1h40m23s

  • @gizellesmith8763
    @gizellesmith8763 Před 3 lety

    I presume that Malek shared some high octane ganja with fridman because during the introduction fridman presented to be uncharacteristically less Russian, almost to appear happy.

  • @burkebaby
    @burkebaby Před rokem

    Jitendra is a clear and organized mind in a world of confusion, at least in the academic world