How computers learn to recognize objects instantly | Joseph Redmon
Vložit
- čas přidán 17. 08. 2017
- Ten years ago, researchers thought that getting a computer to tell the difference between a cat and a dog would be almost impossible. Today, computer vision systems do it with greater than 99 percent accuracy. How? Joseph Redmon works on the YOLO (You Only Look Once) system, an open-source method of object detection that can identify objects in images and video -- from zebras to stop signs -- with lightning-quick speed. In a remarkable live demo, Redmon shows off this important step forward for applications like self-driving cars, robotics and even cancer detection.
Check out more TED talks: www.ted.com
The TED Talks channel features the best talks and performances from the TED Conference, where the world's leading thinkers and doers give the talk of their lives in 18 minutes (or less). Look for talks on Technology, Entertainment and Design -- plus science, business, global issues, the arts and more.
Follow TED on Twitter: / tedtalks
Like TED on Facebook: / ted
Subscribe to our channel: / ted - Věda a technologie
when you tell a yolo joke around an audience that mostly doesn't know what yolo is
"I work on darknet" a.k.a satans company.
That's not exactly a joke anymore. YOLO is a real name for this thing now
Generation 40+ don't get the yolo joke. No offense, it's the same with my parents
its not even funny that is why they didn't laugh
and at the same time a stop sign as a frisbee
This was an awesome TED talk. I wish it was longer. Very impressive that this is being run on a mobile device.
5:32 Detected a parrot as pizza.
This is how the flesh-eating robots begin.
hahahah :d
You goin beyond real-time
and a second later, detected frisbee for stop sign lmao
@@subazsarma *robot throws stop sign like a frisbee thinking it's playing a fun game* - - decapitates human. *detects human head as basketball* - - dunks basketball.
@@PowBamZing hey stop, that's brutal man
This is not a TED talk, this is a deep learning implementation demo ^.^
Amazing! Thank you for open sourcing this. I will be using it as a part of my smart dorm room project I am building !
Taking a computer vision class right now which is taught by Redmon! It's really fun and I've learnt a lot.
This is the future. Thanks, Joseph.
Awesome! Thank you for making it open source.
Wow, this is truly outstanding work, Buddy! Your algorithm is clearly top-notch, and what's even more impressive is your decision to make it open-source. Your vision is truly inspiring. Keep up the fantastic work!
Perfect! I was looking to implement this technology and this is open source!! I’m so excited 😋
I liked the audience's ecstatic reaction on the YOLO reference ^^
they're too old...
Awesome tecnology! Very impressing. I imagine this will be used in drones and some other kinds of robots. They'll be able not only to know what they're seing, but they'll know what they must look for by their own.
The most amazing thing.. is not just to build things but also to share.
Be sure you are great by sharing what you think that will made our life better.
Thank you for the video.
A seven minute clip that's actually an ad and says nothing about how the code actually knows what it's looking at. Thanks for coming to my ted talk.
this
Nobody knows how exactly neural nets work once trained. That is the whole point, they train themselves.
i discorvered this video 4 years ago, i knew i would need it one day, i'm finally starting to learn yolo, thank's a lot ;)
Highly appreciable work by this dude..
Bro you daknet is so sophesticated and you open sourced it....you are a hero
I know right? This is worth so much and yet he made it available for everyone and free lol
this sounded more like a keynote for YOLO rather than a TED talk
every ted talk is a product presentation
@@RomanLeBg Not much of a product when it's free and open source though
@@The_Xeos Yeah but still
@@RomanLeBg Yes, I guess the point is still valid ahah
Pretty much every ted talk.
Im going back to do my masters in biochemistry, and now when someone ask me what do I want to do with that degree, I'll send them this video, this is exactly what I envision for what I want to do with a bachelors in computer science and a masters in biochemestry/nanotech. Well done YOLO guys and Washington University
3:22 It detected a skateboard, apparently
It made a guess based on his pose in a single frame. Common pose for a skateboarder. If the detection threshold is heightened enough it wont show that guess anymore.
Yeah, that's what I thought. Thanks
looking at that single image, human would "detect" skateboard as well. Only when you look at the video, you understand that there's no skateboard there.
because yolo sees contextual information.
Watching the still image of when it detected skateboard, it does look like a skateboard.
for anyone who looks for the app to download!
it's named "Objects Detection Machine Learning TensorFlow Demo" and is available for free for Android in Google Play (org.tensorflow.detect)
looking for "TensorFlow" phrase in Google you can find their website with more stuff and links to source code if you need it
thumbs up so everybody can see :)
Well that is "Tensorflow Demo". This is "Darknet YOLO". They do the same thing, but different algorithms and code base.
Wow, thanks mate!
Amazing! Keep up the great work!
This guy deserves an award 🥇!
I'm really excited because the self-driven cars will be able to use this kind of pretty cool technology, so it will be more safety for all of us
Wonderful explanation. The computer vision is one of the great challenge in Robotics and anonymous vehicles, this algorithm will act most appropriate like the biological model vision. It is going to strive its effort in Pathology domain as well...
Image recognition, importance queue/hierarchy, area navigation...all big questions being increasingly resolved by various groups. Soon we probably CAN have something autonomous, at least like a simple bipedal robot.
On the video, the state of an object can change fron a frame to an other, a second algorithm should analyze what was said over some frames to crrect error on the next frame and this could be a way to train neural network.
Finally a Ted talk actually works for something. Thank you local Thor. We appreciate technology.
This is really impressive!
great application for CCTV systems in general
i can't believe this appears in my feed after 2 years, mid 2019
shame on you, yt
Exactly!!!
maybe you are learning computer vision in 2019?
that's why probably
watches a software vid and obv doesn't know sh.. how algorithms work
Awesome stuff ;-) thanks for sharing :-)
3:15 a remote in his hands!
3:22 doing some skating.
so impressive , I am interested in this field
Wonderful result, pretty impressive tbh!
this is amazing!
Great work brother.. 👍🏻
He never explained how it works so this is a misleading title.
im sure 99,9% would not understand it xD
it did he said they changed the algo from brute force to another yolo
it works by a neural net that uses examples of pictures that are trained into the network like a jigsaw puzzle. the network does not see a complete object, just somthing close enough. the network is made of nodes called perceptrons that can solve only the sepeartion between the red ball and the blue balls. where the diagonal line goes is where the data separates. tensor flow is the newest neural net based on multiple layers and uses somthing called a sigmoid function that can curve the separation of data instead of a straight line. the yolo is based on googles neural net. exactly how yolo is done, i don't know but they did say somthing about segmentation. what the program does is that it looks for somthing similar to a person and if a picture contain many of these similar bodies, it detect them all as individual humans. it does the same with other things. neural network can hold a pattern of somthing that look close enough to a cat and then use the same pattern to recognizes loads of cats in images. they first train the network by giving it a picture of a cat in a empty surrounding. then after that they can show the network a whole picture of many thing including cats and trigger the same pattern they trained into the network evey time somthing looks cat like. training works by making all those little lines inside each perceptrons align in different alignment until the whole of some of this network resemble somthing cat like. you can store many pictures of objects in the same network but each of these pictures will be triggered independently inside the network everytime a pattern similar to them is read. the neural network works like let say a L is similar to a U removing half of it or similar to a E by adding to small streaks. the network can hold the pattern for creating the alphabet in such a way that each letter is similar to each other so that there is a fractal property in it where it reuses part of each letter combined with a little more information additional to create a different letter. it means that a cat is similar to a dog so some part of the pattern that makes a cat is used by the same network to recognize a dog with some additional information. some of the numbers that makes up the cat is close enough to make up the pattern of a dog by adding a little extra information. the network can reuse some of its data of what it already knows together with some new information to recognize a slightly different animal. its dumbness is in, its only as smart as the amount of information in its network. the network can use the pattern to recognzie a cat, to recognzie other objects but it will think that those objects as well are also cats. if the closest pattern the network have of a wheel is the head of a cat, it will think its a cat. that is how the network works. by having many examples of many animals and other objects, the network have more variations to guess what it see from. if the network have trained in the face of a cat, the shape fo a toilet, the shape of a door, and you show it a dog or a microwave, the network will either show you the object is a cat or it show you the object is a door. the network doesn't know anything byond what pattern that is trained into it. it will pick the closest match to what it looks at and decide the closest match is the what it see and not necceserly what we think it is. like in yolo the network can only detect one object at the time. there is still a delay in detection between the objects detected. i think they let the network recognize a person and trigger a output that takes the part of the screen that makes up the person and overlap it with a frame where the classified name is written. the neural network knows what part of the screen that contain the person so they use this to overlay the frame. neural networks like yolo and others are all good memory programs but they can't learn anything on their own. they can only recall what they have learned but can't aquire knowledge on their own. they need to be spoon fed information, one pattern at the time before they can read a pattern of many patterns. the real problem with artifical intelligence today is that they can't learn on their own. that is what makes them useful tools but no real brain and nothing like a real thinking process and far from human like mind. when the networks can learn on their own, they will reach a level that is more like human. the premise of neural network of google still shows what thinking is right there by evaluation, reinforcement, reward and trial error that is going on, on a node level at the level of the perceptron. the problem is that there is no way currently to make the trial, error process do anything else than improving the acuracy of what is trained. you really want that trial, error process to try guessing apart different objects in a picture without pretrain it by let it train itself how to separate content. you can do this if you take google alpha go and combine that with image recognition but that again is byond even google at the moment.
He teaches a fantastic intro class at UW on computer vision: czcams.com/video/8jXIAWg_yHU/video.html
If you wanna learn more about this stuff!
Very impressing, very useful technique.
Amazing speed. Definitely would like to use this. But the title is wrong, "how" this is done isn't touched.
That's great. Thank for what you did
Could be a great fix for security issues where high traffic and low budget are issues. Schools airports and malls spring to mind.
Thank you for YOLO!
Fantastic lecture, thanks for sharing.
Very helpful video!
Awesome work!
Great technology, Thank you all for making better future
Fantastic! Congrats.
wonderful work👏👏👏👏
I already knew all this stuff thanks to a project but wooow seeing that again, that way, looked amazing! Good job! And thanks for making it open-source
Having Darknet as a name and a satanic looking logo is maybe not the best way to show people that they should trust computers...
And you shouldn't.
satan is fake. red is just a color. darknet tho is kinda iffy.
F. S. Yolo
Any intelligent, informed person knows satanic/satanism isn't a negative thing. Likewise, Darknet requires intelligence to use. If you need to be convinced to trust computers, you're irrelevant to civilization. Go ahead and destroy your phone, tv, and any other technology with computing. Fear and stupidity go hand in hand.
They clearly use daemons to do the processing :P
nice work
can you tel me the type of camera
Good news for me bcoz im a visually ipairred person.
And sounds good its a open source thanks and can you check this programe with NVDA its a softwear for blinds Called Non visual desktop access. So its if it work with this softwear mean its good for blind people luke me whom all are loss their visoin in accidently. So sorry im not a graduate bcoz im a uneducated so forgive me for my grammer mistakes.
Gopi Deva333 your comment gave me an idea. glasses that say what is in front of you using a small camera and an earpiece
@@harryfox4389 already done
@@laitila87 can we get more details about what you're referring to? We are trying to achieve the same using a raspberry pi and need a bit of help.
two years back it was fun watching it , now i am back here in 2020 coz now I am computer science student.
If you want self-driving cars and robots, you'll need to go a step further and make it predict the movement of any object in its way. Even if we manage to get a car like that, it would not be enough to control a drone within a swarm, which even tiny birds can do. Pretty impressive for a bird brain...
How to count the number of detected objects using Tensorflow api ??
Thank you in advance
Thank you
cool stuff plus its open souce / free to use!
That's really amazing !
Joseph Redmon - can we use this tech along with autonomous car tech to be able to identify out of control cars in cities and raise barriers ? Just an idea :)
Great work
Inspired me 4 today, thank you
Very informational video
this is so really cool for blind people and more.
Someone typed 38 words in a minute with their mind.. hopefully we can somehow input data into our mind. Soon.
vocaleyes.ai are doing it
@@harshitaarora6319 I feel so old.
Great tutorial!
Awesome project!!
So you know how like you’ll be talking about a certain product near your phone and then you’ll see an ad for that same product like a day later, I believe advertisers also use this technology to basically scan whatever your phone sees (everything because everyone uses their phones for everything)
Joseph Redmon's YOLO algorithm might have just changed the world forever !
this is quite amazing..
- Doctor: Let's try this in the body
- AI: I found a suitcase
I'm joking, very good work ! Thank you
at one point in the video it said frisbee instead of stop sign
Aj Jeji It also saw a pizza somewhere in the audience lol.
The110014 maybe the bot saw into his soul.. :-P
Krishna Mohan and saw a pizza or a frisbee? I think my soul might be a frisbee.
And when he shows off changes in size in the beginning it shows "skateboard" in a green frame too.
it also thinks everyone is wearing ties, but i can see one guy has a lanyard. who wears ties these days?
Great work!
This is amazing
I need this implanted in my Eyes!
You can't recognise a cat?
Mind blowing!
Wow , brilliant 👏
I'll use it!!
I like it so much! It seems that only sience will save our civilization, through opening new horizons for uor curiosity. From Russia with love
Science has no objective. Science is a process, an idea. Humans have plans, and how humans execute those plans depends on how humans structure society and what we value. Love from California, Silicon Valley.
yeah but can it tell the difference between a hot dog and not a hot dog
Andy Skelton lol
JING JAAANG
Andy Skelton silicon valley ;)
Went to a talk where the speaker was the actual coder for the real app Not HotDog. He said he only used one 980 ti hooked to a macbook to train his AI with hotdog pics. Super interesting stuff
Great comment
Awesome! Incredible!
I like his desktop, what is his OS? and theme ?
this man is key to future
hello dear
how to connect SONy camera, may you explain me, thank you very much
I took digital image signal processing in electrical/computer engineering college almost 10 years ago and we definitely knew how to do this then.
tricky part is that seeing a 2d projection of a 3d object, the information content is very very different depending on viewing angle.
@@elliott614 whaty if I make a statue of a duck with my poo, will the IA know what is it?
I want to do research in bci field
which course should I choose
in 5:33 mints it's detect people in separate box detection which show multi box .Is it possible to all the people detect in this case in 1 box
Amazing!
How long does it take to train it to recognize an object? How many objects can it sort through in its dictionary and are there contextual dictionaries that are constantly fluid with the environment (ie as you move around from the freeway to a mall parking lot the object dictionary changes to include shopping carts, parking islands, etc etc? I can't imagine an infinite dictionary always being in use as it would slow the process down.
Imagine that feeling when you conceived something revolutionary like this, that proud
Fantastic!
Can anyone please post the link to the YOLO for phone app
DARKNET
Foreshadowing.... Thus it begins
Hey, what's foreshadowing?
Lucas D OMG fishing net sounds like skynet too!!11eleven
It is an unfortunately ominous name... :(
No doubt...check out that logo too. That's some classic sigil there and looks a little bit like a pentagram even
I am in fear of the possibilities in future.
This is both awesome and scary
4:11 and 15s after that is the closest to "HOW computers learn..."
I hoped for more details when I clicked on this.
It already knows more than we do. The stop sign is not a stop sign, it is a frisbee
This is one of the guys I want working for me.
This, in conjunction with AR, is going to change the world.
What's the second detection algorithm that he executed? Is that Fast RCNN?
3:10 detect "remote" in the arm
This is so cool
Does it have video to guide us how to install it?