How computers learn to recognize objects instantly | Joseph Redmon

TED

zhlédnutí 1 109 783

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 17. 08. 2017
Ten years ago, researchers thought that getting a computer to tell the difference between a cat and a dog would be almost impossible. Today, computer vision systems do it with greater than 99 percent accuracy. How? Joseph Redmon works on the YOLO (You Only Look Once) system, an open-source method of object detection that can identify objects in images and video -- from zebras to stop signs -- with lightning-quick speed. In a remarkable live demo, Redmon shows off this important step forward for applications like self-driving cars, robotics and even cancer detection.
Check out more TED talks: www.ted.com
The TED Talks channel features the best talks and performances from the TED Conference, where the world's leading thinkers and doers give the talk of their lives in 18 minutes (or less). Look for talks on Technology, Entertainment and Design -- plus science, business, global issues, the arts and more.
Follow TED on Twitter: / tedtalks
Like TED on Facebook: / ted
Subscribe to our channel: / ted
Věda a technologie

Komentáře • 853

@omooba00 Před 6 lety ⁺¹¹⁵⁹
when you tell a yolo joke around an audience that mostly doesn't know what yolo is
@fireplant4769 Před 5 lety ⁺⁴¹
"I work on darknet" a.k.a satans company.
@kalucky0 Před 5 lety ⁺¹⁵
That's not exactly a joke anymore. YOLO is a real name for this thing now
@GeneralTerzX Před 5 lety ⁺¹⁴
Generation 40+ don't get the yolo joke. No offense, it's the same with my parents
@supertony02 Před 5 lety ⁺¹¹
its not even funny that is why they didn't laugh
@Ivar_Mennes_ Před 5 lety
and at the same time a stop sign as a frisbee
@JJs_playground Před 6 lety ⁺⁵⁴
This was an awesome TED talk. I wish it was longer. Very impressive that this is being run on a mobile device.
@PowBamZing Před 5 lety ⁺⁶¹⁰
5:32 Detected a parrot as pizza.
This is how the flesh-eating robots begin.
@MuhammetKaratasmhmmtkrts Před 5 lety
hahahah :d
@nikhilnambiar7160 Před 5 lety ⁺⁴
You goin beyond real-time
@subazsarma Před 5 lety ⁺³
and a second later, detected frisbee for stop sign lmao
@PowBamZing Před 5 lety ⁺⁹
@@subazsarma *robot throws stop sign like a frisbee thinking it's playing a fun game* - - decapitates human. *detects human head as basketball* - - dunks basketball.
@brianquin3778 Před 4 lety
@@PowBamZing hey stop, that's brutal man
@TienTaioan Před 5 lety ⁺⁸⁴
This is not a TED talk, this is a deep learning implementation demo ^.^
@BlakeEdwards333 Před 5 lety ⁺¹¹
Amazing! Thank you for open sourcing this. I will be using it as a part of my smart dorm room project I am building !
@prodRefault Před rokem ⁺⁴
Taking a computer vision class right now which is taught by Redmon! It's really fun and I've learnt a lot.
@oatscurry Před 6 lety ⁺⁵
This is the future. Thanks, Joseph.
@ameesami Před 4 lety ⁺³
Awesome! Thank you for making it open source.
@kvnptl4400 Před 5 lety ⁺¹
Wow, this is truly outstanding work, Buddy! Your algorithm is clearly top-notch, and what's even more impressive is your decision to make it open-source. Your vision is truly inspiring. Keep up the fantastic work!
@Crispypoyo Před rokem ⁺¹
Perfect! I was looking to implement this technology and this is open source!! I’m so excited 😋
@itshelpa Před 6 lety ⁺¹⁵
I liked the audience's ecstatic reaction on the YOLO reference ^^
@Leotique Před 6 lety ⁺¹
they're too old...
@VictorOliveira Před 5 lety ⁺⁹
Awesome tecnology! Very impressing. I imagine this will be used in drones and some other kinds of robots. They'll be able not only to know what they're seing, but they'll know what they must look for by their own.
@kamaliasc Před 5 lety ⁺³
The most amazing thing.. is not just to build things but also to share.
Be sure you are great by sharing what you think that will made our life better.
Thank you for the video.
@TheKingoftheriff Před 5 lety ⁺²⁷
A seven minute clip that's actually an ad and says nothing about how the code actually knows what it's looking at. Thanks for coming to my ted talk.
@JabaKochashvili Před 5 lety ⁺⁴
this
@SpencerLemay Před 4 lety ⁺²
Nobody knows how exactly neural nets work once trained. That is the whole point, they train themselves.
@Julia-ut9cv Před 2 lety
i discorvered this video 4 years ago, i knew i would need it one day, i'm finally starting to learn yolo, thank's a lot ;)
@jeenbhawanischannels6630 Před 4 lety ⁺¹⁰
Highly appreciable work by this dude..
Bro you daknet is so sophesticated and you open sourced it....you are a hero
@MiuraUY Před 4 lety ⁺⁴
I know right? This is worth so much and yet he made it available for everyone and free lol
@ImSkully Před 5 lety ⁺⁷³
this sounded more like a keynote for YOLO rather than a TED talk
@RomanLeBg Před 4 lety
every ted talk is a product presentation
@The_Xeos Před 4 lety ⁺¹
@@RomanLeBg Not much of a product when it's free and open source though
@RomanLeBg Před 4 lety
@@The_Xeos Yeah but still
@The_Xeos Před 4 lety
@@RomanLeBg Yes, I guess the point is still valid ahah
@missionpupa Před 4 lety
Pretty much every ted talk.
@rafaelg0225 Před 6 lety
Im going back to do my masters in biochemistry, and now when someone ask me what do I want to do with that degree, I'll send them this video, this is exactly what I envision for what I want to do with a bachelors in computer science and a masters in biochemestry/nanotech. Well done YOLO guys and Washington University
@worldcitizen001 Před 6 lety ⁺²⁶⁰
3:22 It detected a skateboard, apparently
@Mornys Před 6 lety ⁺⁵¹
It made a guess based on his pose in a single frame. Common pose for a skateboarder. If the detection threshold is heightened enough it wont show that guess anymore.
@worldcitizen001 Před 6 lety ⁺⁶
Yeah, that's what I thought. Thanks
@metalim Před 6 lety ⁺¹⁷
looking at that single image, human would "detect" skateboard as well. Only when you look at the video, you understand that there's no skateboard there.
@CoffeeYumYumYum Před 5 lety ⁺¹
because yolo sees contextual information.
@DrAbhinavKumar Před 5 lety ⁺⁷
Watching the still image of when it detected skateboard, it does look like a skateboard.
@programista_html8888 Před 6 lety ⁺¹⁴
for anyone who looks for the app to download!
it's named "Objects Detection Machine Learning TensorFlow Demo" and is available for free for Android in Google Play (org.tensorflow.detect)
looking for "TensorFlow" phrase in Google you can find their website with more stuff and links to source code if you need it
thumbs up so everybody can see :)
@hojjat5000 Před 6 lety
Well that is "Tensorflow Demo". This is "Darknet YOLO". They do the same thing, but different algorithms and code base.
@JoeIdolChannel139 Před 6 lety
Wow, thanks mate!
@amirhosseinnayebiastaneh Před 3 lety
Amazing! Keep up the great work!
@SameerSameer-fp9lf Před 5 lety ⁺³
This guy deserves an award 🥇!
@fernandodiaz8258 Před 5 lety
I'm really excited because the self-driven cars will be able to use this kind of pretty cool technology, so it will be more safety for all of us
@brindhasenthilkumar7871 Před 4 lety ⁺⁵
Wonderful explanation. The computer vision is one of the great challenge in Robotics and anonymous vehicles, this algorithm will act most appropriate like the biological model vision. It is going to strive its effort in Pathology domain as well...
@davidsirmons Před 5 lety ⁺¹
Image recognition, importance queue/hierarchy, area navigation...all big questions being increasingly resolved by various groups. Soon we probably CAN have something autonomous, at least like a simple bipedal robot.
@eddiegerbais-nief7745 Před 6 lety ⁺¹
On the video, the state of an object can change fron a frame to an other, a second algorithm should analyze what was said over some frames to crrect error on the next frame and this could be a way to train neural network.
@rootsharp9946 Před 4 lety ⁺²
Finally a Ted talk actually works for something. Thank you local Thor. We appreciate technology.
@TheSunscratch Před 6 lety ⁺¹
This is really impressive!
@mmughal Před 6 lety
great application for CCTV systems in general
@antoniboryna9725 Před 5 lety ⁺¹⁰⁶
i can't believe this appears in my feed after 2 years, mid 2019
shame on you, yt
@Zenatrix Před 5 lety ⁺¹
Exactly!!!
@olvenetworks5176 Před 4 lety ⁺²
maybe you are learning computer vision in 2019?
that's why probably
@donaldtrumpjr1036 Před 4 lety
watches a software vid and obv doesn't know sh.. how algorithms work
@darudeSandstorm. Před 6 lety ⁺¹
Awesome stuff ;-) thanks for sharing :-)
@frodeflem9353 Před 6 lety
3:15 a remote in his hands!
3:22 doing some skating.
@user-wm7se8tk1k Před 2 lety
so impressive , I am interested in this field
@HiAdrian Před 6 lety
Wonderful result, pretty impressive tbh!
@doubtunites168 Před 5 lety
this is amazing!
@GautamNarse Před 6 lety
Great work brother.. 👍🏻
@ganjiraja9407 Před 5 lety ⁺¹⁰¹
He never explained how it works so this is a misleading title.
@zayon0170 Před 4 lety ⁺¹²
im sure 99,9% would not understand it xD
@histimemanof4954 Před 4 lety
it did he said they changed the algo from brute force to another yolo
@primodernious Před 4 lety ⁺¹³
it works by a neural net that uses examples of pictures that are trained into the network like a jigsaw puzzle. the network does not see a complete object, just somthing close enough. the network is made of nodes called perceptrons that can solve only the sepeartion between the red ball and the blue balls. where the diagonal line goes is where the data separates. tensor flow is the newest neural net based on multiple layers and uses somthing called a sigmoid function that can curve the separation of data instead of a straight line. the yolo is based on googles neural net. exactly how yolo is done, i don't know but they did say somthing about segmentation. what the program does is that it looks for somthing similar to a person and if a picture contain many of these similar bodies, it detect them all as individual humans. it does the same with other things. neural network can hold a pattern of somthing that look close enough to a cat and then use the same pattern to recognizes loads of cats in images. they first train the network by giving it a picture of a cat in a empty surrounding. then after that they can show the network a whole picture of many thing including cats and trigger the same pattern they trained into the network evey time somthing looks cat like. training works by making all those little lines inside each perceptrons align in different alignment until the whole of some of this network resemble somthing cat like. you can store many pictures of objects in the same network but each of these pictures will be triggered independently inside the network everytime a pattern similar to them is read. the neural network works like let say a L is similar to a U removing half of it or similar to a E by adding to small streaks. the network can hold the pattern for creating the alphabet in such a way that each letter is similar to each other so that there is a fractal property in it where it reuses part of each letter combined with a little more information additional to create a different letter. it means that a cat is similar to a dog so some part of the pattern that makes a cat is used by the same network to recognize a dog with some additional information. some of the numbers that makes up the cat is close enough to make up the pattern of a dog by adding a little extra information. the network can reuse some of its data of what it already knows together with some new information to recognize a slightly different animal. its dumbness is in, its only as smart as the amount of information in its network. the network can use the pattern to recognzie a cat, to recognzie other objects but it will think that those objects as well are also cats. if the closest pattern the network have of a wheel is the head of a cat, it will think its a cat. that is how the network works. by having many examples of many animals and other objects, the network have more variations to guess what it see from. if the network have trained in the face of a cat, the shape fo a toilet, the shape of a door, and you show it a dog or a microwave, the network will either show you the object is a cat or it show you the object is a door. the network doesn't know anything byond what pattern that is trained into it. it will pick the closest match to what it looks at and decide the closest match is the what it see and not necceserly what we think it is. like in yolo the network can only detect one object at the time. there is still a delay in detection between the objects detected. i think they let the network recognize a person and trigger a output that takes the part of the screen that makes up the person and overlap it with a frame where the classified name is written. the neural network knows what part of the screen that contain the person so they use this to overlay the frame. neural networks like yolo and others are all good memory programs but they can't learn anything on their own. they can only recall what they have learned but can't aquire knowledge on their own. they need to be spoon fed information, one pattern at the time before they can read a pattern of many patterns. the real problem with artifical intelligence today is that they can't learn on their own. that is what makes them useful tools but no real brain and nothing like a real thinking process and far from human like mind. when the networks can learn on their own, they will reach a level that is more like human. the premise of neural network of google still shows what thinking is right there by evaluation, reinforcement, reward and trial error that is going on, on a node level at the level of the perceptron. the problem is that there is no way currently to make the trial, error process do anything else than improving the acuracy of what is trained. you really want that trial, error process to try guessing apart different objects in a picture without pretrain it by let it train itself how to separate content. you can do this if you take google alpha go and combine that with image recognition but that again is byond even google at the moment.
@koopa2222 Před 4 lety ⁺²
He teaches a fantastic intro class at UW on computer vision: czcams.com/video/8jXIAWg_yHU/video.html
If you wanna learn more about this stuff!
@havalsadiq3655 Před rokem
Very impressing, very useful technique.
@jumbo6498 Před 5 lety
Amazing speed. Definitely would like to use this. But the title is wrong, "how" this is done isn't touched.
@baochaud11cn5ptit Před 6 lety
That's great. Thank for what you did
@aaronallgrunn7845 Před 6 lety
Could be a great fix for security issues where high traffic and low budget are issues. Schools airports and malls spring to mind.
@DrGrimmm Před rokem
Thank you for YOLO!
@ARCritic Před 6 lety ⁺¹
Fantastic lecture, thanks for sharing.
@alphonseelric7361 Před 3 lety
Very helpful video!
@AsaadMalik1 Před 6 lety
Awesome work!
@saifal-haqazzam7695 Před 5 lety
Great technology, Thank you all for making better future
@nsfa19 Před 6 lety
Fantastic! Congrats.
@kevalrajpal4244 Před 4 lety
wonderful work👏👏👏👏
@SebLeFrenchie Před 6 lety ⁺¹
I already knew all this stuff thanks to a project but wooow seeing that again, that way, looked amazing! Good job! And thanks for making it open-source
@FritzSchober Před 6 lety ⁺¹⁸⁷⁰
Having Darknet as a name and a satanic looking logo is maybe not the best way to show people that they should trust computers...
@janaebert3059 Před 6 lety ⁺⁵¹
And you shouldn't.
@xRays6 Před 6 lety ⁺⁹⁷
satan is fake. red is just a color. darknet tho is kinda iffy.
@furtherback6131 Před 6 lety
F. S. Yolo
@eclipse5393 Před 6 lety ⁺⁸²
Any intelligent, informed person knows satanic/satanism isn't a negative thing. Likewise, Darknet requires intelligence to use. If you need to be convinced to trust computers, you're irrelevant to civilization. Go ahead and destroy your phone, tv, and any other technology with computing. Fear and stupidity go hand in hand.
@XenogeneGray Před 6 lety ⁺⁸⁸
They clearly use daemons to do the processing :P
@alaaabd2598 Před 6 lety
nice work
can you tel me the type of camera
@gopideva3333 Před 6 lety ⁺⁴⁵
Good news for me bcoz im a visually ipairred person.
And sounds good its a open source thanks and can you check this programe with NVDA its a softwear for blinds Called Non visual desktop access. So its if it work with this softwear mean its good for blind people luke me whom all are loss their visoin in accidently. So sorry im not a graduate bcoz im a uneducated so forgive me for my grammer mistakes.
@harryfox4389 Před 5 lety ⁺¹
Gopi Deva333 your comment gave me an idea. glasses that say what is in front of you using a small camera and an earpiece
@laitila87 Před 5 lety
@@harryfox4389 already done
@vamsip98 Před 5 lety
@@laitila87 can we get more details about what you're referring to? We are trying to achieve the same using a raspberry pi and need a bit of help.
@bunnyrabits Před 3 lety
two years back it was fun watching it , now i am back here in 2020 coz now I am computer science student.
@edi9892 Před 6 lety ⁺²
If you want self-driving cars and robots, you'll need to go a step further and make it predict the movement of any object in its way. Even if we manage to get a car like that, it would not be enough to control a drone within a swarm, which even tiny birds can do. Pretty impressive for a bird brain...
@vishwadabholkar6113 Před 5 lety ⁺¹
How to count the number of detected objects using Tensorflow api ??
Thank you in advance
@harikrishnan_tn Před 6 lety
Thank you
@Blazah99 Před 6 lety ⁺²
cool stuff plus its open souce / free to use!
@abd-elrahmanmohamed9839 Před 6 lety
That's really amazing !
@T-Sav Před 6 lety ⁺¹
Joseph Redmon - can we use this tech along with autonomous car tech to be able to identify out of control cars in cities and raise barriers ? Just an idea :)
@aakashkumarcs Před 5 lety
Great work
@andredejager3637 Před 5 lety
Inspired me 4 today, thank you
@mcerovski Před 6 lety
Very informational video
@antoniobortoni Před 6 lety ⁺²⁸
this is so really cool for blind people and more.
@atticus_foust Před 6 lety ⁺³
Someone typed 38 words in a minute with their mind.. hopefully we can somehow input data into our mind. Soon.
@harshitaarora6319 Před 6 lety ⁺¹
vocaleyes.ai are doing it
@pramodanantha5620 Před 5 lety
@@harshitaarora6319 I feel so old.
@VincentKokVK Před 5 lety
Great tutorial!
@carolinedesouza7570 Před rokem
Awesome project!!
@joeschraer925 Před 3 lety
So you know how like you’ll be talking about a certain product near your phone and then you’ll see an ad for that same product like a day later, I believe advertisers also use this technology to basically scan whatever your phone sees (everything because everyone uses their phones for everything)
@jubyjoseph6332 Před 3 lety
Joseph Redmon's YOLO algorithm might have just changed the world forever !
@ahmadraza2156 Před 6 lety
this is quite amazing..
@JohnSmithhh Před 4 lety ⁺¹⁴
- Doctor: Let's try this in the body
- AI: I found a suitcase
I'm joking, very good work ! Thank you
@ajjeji3957 Před 6 lety ⁺¹⁵²
at one point in the video it said frisbee instead of stop sign
@The110014 Před 6 lety ⁺²⁵
Aj Jeji It also saw a pizza somewhere in the audience lol.
@krishnamohan2351 Před 6 lety ⁺²⁴
The110014 maybe the bot saw into his soul.. :-P
@HLB313 Před 6 lety ⁺⁴
Krishna Mohan and saw a pizza or a frisbee? I think my soul might be a frisbee.
@markiemannetje Před 6 lety ⁺¹
And when he shows off changes in size in the beginning it shows "skateboard" in a green frame too.
@nmarbletoe8210 Před 6 lety
it also thinks everyone is wearing ties, but i can see one guy has a lanyard. who wears ties these days?
@LeeTom Před 5 lety
Great work！
@mahitmehta9620 Před 3 lety
This is amazing
@sarsargud85 Před 6 lety ⁺²
I need this implanted in my Eyes!
@gagaming5210 Před 4 lety
You can't recognise a cat?
@devanshusachdev9190 Před 4 lety
Mind blowing!
@AyoubKalai Před 2 lety
Wow , brilliant 👏
@RafaGmod Před 6 lety
I'll use it!!
@artemvarlamoff2840 Před 6 lety ⁺¹
I like it so much! It seems that only sience will save our civilization, through opening new horizons for uor curiosity. From Russia with love
@Bastogne1944 Před 6 lety
Science has no objective. Science is a process, an idea. Humans have plans, and how humans execute those plans depends on how humans structure society and what we value. Love from California, Silicon Valley.
@Skeltzz127 Před 6 lety ⁺⁷⁸⁸
yeah but can it tell the difference between a hot dog and not a hot dog
@ShubhamMajmudar Před 6 lety ⁺⁵
Andy Skelton lol
@MaZe741 Před 6 lety ⁺⁵⁰
JING JAAANG
@helloworld5682 Před 6 lety ⁺⁷
Andy Skelton silicon valley ;)
@CortezBumf Před 6 lety ⁺⁶
Went to a talk where the speaker was the actual coder for the real app Not HotDog. He said he only used one 980 ti hooked to a macbook to train his AI with hotdog pics. Super interesting stuff
@rbrtsmith Před 6 lety
Great comment
@felipefreitas6543 Před 5 lety
Awesome! Incredible!
@orozcohsu1 Před 4 lety
I like his desktop, what is his OS? and theme ?
@SinanAkkoyun Před 5 lety
this man is key to future
@teanchen8370 Před 5 lety
hello dear
how to connect SONy camera, may you explain me, thank you very much
@elliott614 Před 5 lety
I took digital image signal processing in electrical/computer engineering college almost 10 years ago and we definitely knew how to do this then.
@elliott614 Před 5 lety
tricky part is that seeing a 2d projection of a 3d object, the information content is very very different depending on viewing angle.
@canobenitez Před 2 lety
@@elliott614 whaty if I make a statue of a duck with my poo, will the IA know what is it?
@ramzanul6958 Před 5 lety
I want to do research in bci field
which course should I choose
@AmirAli-ji1xm Před 5 lety
in 5:33 mints it's detect people in separate box detection which show multi box .Is it possible to all the people detect in this case in 1 box
@vipingautam9501 Před 2 lety
Amazing!
@Gromitdog1 Před 11 měsíci
How long does it take to train it to recognize an object? How many objects can it sort through in its dictionary and are there contextual dictionaries that are constantly fluid with the environment (ie as you move around from the freeway to a mall parking lot the object dictionary changes to include shopping carts, parking islands, etc etc? I can't imagine an infinite dictionary always being in use as it would slow the process down.
@lzh00 Před 10 měsíci
Imagine that feeling when you conceived something revolutionary like this, that proud
@rxzhou1859 Před 6 lety
Fantastic!
@7688abinash Před 6 lety ⁺¹
Can anyone please post the link to the YOLO for phone app
@allenantony6523 Před 6 lety ⁺²⁸⁶
DARKNET
Foreshadowing.... Thus it begins
@sxfxcn0993 Před 6 lety ⁺¹
Hey, what's foreshadowing?
@hassiaschbi Před 6 lety ⁺¹⁶
Lucas D OMG fishing net sounds like skynet too!!11eleven
@LeeGinSing Před 6 lety ⁺⁹
It is an unfortunately ominous name... :(
@christopherdiedrich40 Před 6 lety ⁺⁷
No doubt...check out that logo too. That's some classic sigil there and looks a little bit like a pentagram even
@derarty4290 Před 6 lety ⁺¹
I am in fear of the possibilities in future.
@lankymoose1831 Před 5 lety ⁺¹
This is both awesome and scary
@Uthael_Kileanea Před 4 lety
4:11 and 15s after that is the closest to "HOW computers learn..."
I hoped for more details when I clicked on this.
@gamersonline3491 Před 4 lety ⁺²
It already knows more than we do. The stop sign is not a stop sign, it is a frisbee
@davidsirmons Před 5 lety
This is one of the guys I want working for me.
@SufferDYT Před 5 lety
This, in conjunction with AR, is going to change the world.
@souvikdas7200 Před 4 lety
What's the second detection algorithm that he executed? Is that Fast RCNN?
@datasciencenews Před 4 lety ⁺¹
3:10 detect "remote" in the arm
@yvessennane4254 Před 4 lety ⁺¹
This is so cool
@leonkok7088 Před 6 lety
Does it have video to guide us how to install it?

Další v pořadí

Automatické přehrávání

How we teach computers to understand pictures | Fei Fei Li