Mask RCNN - COCO - instance segmentation
Vložit
- čas přidán 5. 11. 2017
- source: github.com/karolmajek/Mask_RCNN
Input 4K video: [NEW LINK!!!]
archive.org/details/000220170...
If this video helped you somehow - you can buy me a coffee:
bit.ly/Coffee4Karol - Věda a technologie
Would really love a video tutorial :)
it's really impressive.
Question: for segmentation semantic models : Bisenet & segnet, it is necessery to prepare the masque for training? if yes how why to prepare this masque ? thank you for anwser me.
Hi, I want to know which is the annotation tool to create the dataset (like MS Coco). I used labelImg but it can create only the Pascal VOC data set. How to create the mask for image in case of MS Coco ? Thank for your help !
Do you want to create contours of objects? There is a tool available, I don't remember name/link, sorry
HI
Can anybody help me how can i use mask rcnn to train for mainly 3 object classes on my own video dataset? What we need to modify in train_shapes.ipynb for loading images from own dataset?
thanks !
T-900 point of view
hi, i am trying to set my own dataset with create_pet_tf_record.py i have mask, xml and png, but when i launch train and eval masks are not displayed... have you an idea ? no one answer me
at 0:48 it finds reflection of car in mirror at right side , Really good !
at 1:23 it also finds a bus right there ;)
Would be good if it understood it's a reflection.
Imagine advertisements, billboards with cars or people...
Wouldn't that be bad? Ideally it should detect that it's a mirror of a car, not another different car.
You need a lot of examples and a class for that. For autonomous machines it's a lot easier to use lidar.
15:19 "Clock 98% ... " - it's ventilation, man! VENTILATION!
no ventilation in COCO set cocodataset.org that's the reason
However this ventilation has a lot of the same features as a clock. If you look closer you can see how they relate :-)
What about the bus (van) and the three sinks (litterly just road)?
That's why they never labeled anything 100%. Machine vision can "never" be accurate as human eye.
impressive
Great job
I’m interested in using this with Unity. I already have Tensorflow and I have been playing around with an object detection example project from the internet and it works pretty well but I like the idea of Masking the objects instead of just bounding boxes. Do you think it’s possible?
you can try it in colab
Colab != Unity.
Try newer networks. YOLACT, DETR
Would it be possible to track rats going around a track, and get their order ? Is yolo a better choice to do this ?
YOLO/SSD are faster and should do this, but you will need also tracking I guess
Thanks.
May I use this footage in a project? I'm looking for this kind of footage, but nothing I find licensed in Creative Commons compares to this.
Ok, but please share the result with me. Just wanted to know. Thanks for asking!
Holy shit. Good job.
this is more like semantic segmentation? how can i reproduce the result?
how precisely accurate computers have become, we're in a real trouble now
Very impressive work Karol. I'm working on a home surveillance tool (personal project) that utilizes IP cameras and an object detection neural network. Have you done much work in low-light environments? Any tips for working with images that have IR illumination (most security cameras use IR for night mode so the images are in greyscale).
It's hard if you have bad illumination. I was working with thermal camera, but not with RGB+illumination. Did you already tried to run something on such videos? If you want I can run some algorithms and publish results here, on my channel
Dzień dobry,
należy przyznać, że algorytm segmentacji w połączeniu z YOLO(?) TensorFlow (?) działa naprawdę dobrze, jestem pod wrażeniem. Mam pytanie odnośnie użytego sprzętu i jak wyglądało użycie CPU i GPU oraz czy algorytm wykorzystywał obraz w oryginalnej rozdzielczości 4k, czy też był on specjalnie zmniejszany do obliczeń? Jaki FPS osiągnięto? Podejrzewam, że o ile samo wykrywanie obiektów może odbywać się w czasie rzeczywistym o tyle połączenie tego z segmentacją już nie koniecznie, jak to wyglądało w praktyce?
Dzień dobry,
Wejście 1024x1024. Mask RCNN arxiv.org/abs/1703.06870 w Tensorflow. Na GPU: GTX1080 ~1FPS (muszę to jeszcze sprawdzić bo w hoście mam jeszcze K40 i FPS też nie jestem pewien)
A więc działa to znacznie wolniej niż YOLO czy SSD, ale wyjście jest bardziej atrakcyjne
Dziękuję za odpowiedź. Myślę, że w wolnej chwili spróbuję uruchomić YOLO albo tensorflow z bazą COCO.
Nie myślał Pan o przetestowaniu tego typu algorytmu na filmie z platformy latającej (dronie)? Osobiście jestem ciekaw czy baza COCO pozwoli na detekcję obiektów widzianych z góry i z dużej odległości.
YOLO może mieć problem - źle działa dla małych obiektów. Bardzo interesujący problem, niestety nie mam takiego materiału video. Co do samego uruchamiania na dronie - proszę sprawdzić ten film - na telefonie: czcams.com/video/cQJa9AVEAII/video.html
Co do danych z góry - kwestia przygotowania zbioru danych i wytrenowania sieci neuronowej.
Great video, Karol. Thanks for sharing. I think yours is the only implementation where you see only one color for each label (people, car, etc) which makes it much more understandable vs the original. How are you doing that? Thanks!
Bro did you find an answer? I am trying to the same thing
It's not a secret, I plan to release code and create tutorial. Sorry that you are waiting so long
It will be great I am waiting for that tutorial thank you :)
Is this implementation better than YOLO? Also, how do you count uniquely the objects detected, without counting the same object twice?
This is single frame detection, without any tracking, so you wouldn't know. But in each frame you are able to count objects, as output you get list of instances + contours
Why it can't recognize the Big Big van right in front during time frame 16-18 min? It starts to recognize the van only when in distance or looks smaller. This remind me the Tesla fatal accident of running into a trailer. any explanation?
Good catch. I noticed that it recognises it as a bus before it enters the tunnel. Once inside it has seemed to reacquire the shape everytime the brightness level changes. Also I think it's because it's directly in front, and does not have a stong enough contour to recognise the object at once. Will be interesting to hear Mr.Karol's perspective on this.
I think the biggest issue is dataset bias (trucks are far away, more pickups, or other types, ...). MS COCO is used for training - look here what the truck is cocodataset.org/#explore If we lower detection threshold it should appear, but we will have far more bad results...
Thanks for share this video, how can i get this code to learn?
Is there a cpp implementation with caffe?? I have not found on github.
+Xuling Chang I don't know any. I googled but haven't found any results
I want to learn this stuffs. Just installed Cvat on my MacBook air.
Can you please suggest me from where should I start? As I have pretty vague knowledge about this things.
I don't think cvat is a good start, but it depends what is your goal, what do you want to achieve
Really good implementation, but perhaps for safety critical use-cases, I'll limit on using the net only for vehicle and pedestrian detection only, with probably extension to traffic light detection (spot on). There's plenty of mislabel for traffic signs, almost all signs were detected as stop signs.
Karol, Super cool video! I wonder where you got those input videos. Can I use them in my own demo?
Best,
My own, recorded with a phone. Yes, go ahead! Can you share results publicly? Would be great to see!
Will do if I have a proper video produced. Thanks!
Is it possible to measure movement speed?
you've been watched
-person of interest
aga ben seni daha önce de bir ai videosunun yorumlarında görmüştüm sanki
It was You are being watched! I think!
Nice result
Thanks!
Great job, but I wonder if it is possible to hide the square outline and the shade of each detected object. Instead, I want the TensorFlow tells us the name of an object when we touch it on the screen.
Nice idea for an app. It is possible
Could you show me how to do it?
Hi, what are the hardware requirements for it to run in real-time? Thanks.
Its not ment for realtime too much weight
True. And it's quite old. Check YOLACT and DETR
is the underlying data available (video + instance segmentations for each frame)?
Only the input video is shared
now we need just to make it detect more details of objects and translate these speed signs into numbers for speed regulation and get enough detail to tell what ever the car is a farrari or a ford as well as being able to detect red green and blue color on traffic lights and make it do a stop on red light or a stop if a person is on the way. had also been cool if the program could detect if the person what a male or female, long hair or short hair, sun glasses or no sun glasses and other details. i hope such detail will be included in future versions.
what Hardware did you use for the recognition and on how much FPS you run it?
desktop 1080. it was slow. I don't remember now. In next mask RCNN vid I will try to add FPS
I am trying to segment and detect 2 different images, however the results i get are identical so that every image has the masks and detection of only the first image. Does anyone else have this problem?
You mean to put 2 images as input (batch size=2), not to run prediction 2 times (run 2x with batch size=1)? Can you share your code?
The first part. Currently I have the prediction running 2 times (batch size=1). But I want to be able to run 2 images simultaneously with the mask updating for each image.I hope I explained that clearly, let me know.
Quick follow up question, how did you get the mask to stay a constant color for each frame?
Impressive! This looks like V.A.T.S.
Funny that it sees the fans in the tunnels as clocks.
Hey dude, I also made a video demo and i had a webcam test, but the speed is very slow, only about 1~2 fps. Do you know how to improve the test speed ? I want to make a real time application.
Same for me, I also have 1-2 fps. This video is not realtime. I increased speed to match 30fps after detection
Yeah, same for me. I tested on a GTX TITAN X, what about 1080 ti? If you have progress about speed, could you please update your github? Appreciate it a lot!
GTX 980m, measured only the time of inference. There are computational heavy steps required to blend masks with input video
*Top!*
how much space is required for the network model to be trained? in my case for 400x400 image (batchsize=14) I need 11gb memory (U net basic model).
For mask RCNN I don't know. If you reduce batch size you can use smaller GPU, but bigger batch is better. Faster RCNN with NASNet requires 8gb for 1 image in batch...
@@KarolMajek 8gb is really huge wow. Thank you for the information.
Go to tensorflow object detection model zoo and check prediction speed. Slower nets need more memory to train
Dzień dobry,
Czy możliwa jest realizacja tego typu projektu w środowisku Google Colab?
Uruchomić się da, ale Colab jest bardzo niewdzięczny. W wersji pro mocniejsze maszyny, ale lepiej wychodzi gdy się uruchomi na laptopie - sesja się nagle nie skończy i nie trzeba dostosowywać kodu. Zapraszam do mailingu mojego, tam wygodniej się pogada, albo przez jakiś formularz kontaktowy. A chodzi konkretnie o uruchomienie inferencji by dostać segmentację instancji? Może wystarczą ramki jednak, bo szybciej
The most impressive part of this is the detection of the car via a reflection on the glass
hi
when i runs the code, in the step of Create Model and Load Trained Weights
i faced this error:
AttributeError: 'TensorShape' object has no attribute 'rank'
any help?
I think you're using newer version of TensorFlow
@@KarolMajek am using tensorflow 1.9.0
@@abdullahzaqebah7919 Try Matterport original repo github.com/matterport/Mask_RCNN and their updated demo github.com/matterport/Mask_RCNN/blob/master/samples/demo.ipynb
TF version seems ok (>1.3.0)
In order to get better result , do you train with 4K image pairs? i
This is original result from a model trained on COCO dataset. 4k video is for test purpose only
@@KarolMajek got it, Thank you!
why its hard to breath while working on this??
Is it? Maybe it's the air pollution
hello sir i have download the source link but i am not able to run the proj.please give me the doc if you have
You will need python, tensorflow and jupyter notebook installed. Then try to run this: github.com/karolmajek/Mask_RCNN/blob/master/demo.ipynb
first thanks for your reply.i have allready install the python,tensorflow and jupyter notebook.i have soma cuda error can you give the specific version of tensorflow and cuda toolkit version
Transparent colors looks great. The question is, is it possible to integrate this into google glass, so that you have a real life color hack? I think it is because it is easy process screenshots in real time at 640x360 resolution, no?
Thanks, mask rcnn is too slow. Check YOLACT or DETR
Crazy shit!
Hai !! am new to this mask RCNN. but i was really impressed by ur work.am having some doubt in running the github work. Can i mail u ?
what is the mask?
how did you set same color for all cars?
I am going to share this code finally, but have too many other things. Thank you for patience
Hi, It is very good job , and I can run matterport/Mask_RCNN but it is so slow(GTX 1060 , about 450ms per frame which frame size is 640*480 ), How fast you are?
Thanks! I confirm this is slow. It took day or two to process this video :-) I need to add fps display next time
I'm sorry, I didn't see the previous comments. Mybe most of us are interested in using Mask Rcnn for real-time.
Can you make it recognize different types of trees from a long way?
You will need manually annotated dataset. But from the large distance the results will not be perfect
What is the average fps it could run on a GPU like titan x?
Hard to say
Ignore Mask RCNN, chech YOLACT - it's much faster!
Good! How to deal with the video? I can only use the source code to detect a single picture.
maybe I forgot to push. Basically you need to predict for every from and them make video from it - I am using ffmpeg, can recommend
Is it real-time? Or do you analyze each frame and write masks onto video?
Not realtime
Emmmm.. So you process image by image and combine into a video?
Correct. Mask RCNN is slow. I plan to run 4 other maskr rcnns with different detectors and will put fps there
Yeah. It's indeed slow, I apply it on my own dataset and the best I got is 5 FPS. Do you have any recommendation for accelerating the prediction? Like any nice paper? Thank you!
Not yet, I think solution is waiting somewhere, but I haven't found it. I would give a try to any pytorch implementation. Should be faster because of channel-first image representation
I have successfully run the demo of Mask-RCNN but problem is that it does not display the output image
matiqul islam tu can use matplot or cv imshow. I can send you demo code via email if you want
Thanks brother please send me the code. My email address is : matiqul06@gmail.com
Dear Karol Majek I have another Question: Do you tell me Where I get or How I can generate "mask_rcnn_shapes.h5" Shapes trained weights. Also send me the demo code of my email: matiqul06@gmail.com
How is it possible that with such a high resolution the detection is so unstable. E.g. the utility vehicule at 10:00 right in front of the camera get undetected several secondes...
It's not a matter of resolution. This net was trained on COCO, and here you have images from different distribution.
I think Mask RCNN trained on Open Images v5 would work much better
check also YOLACT which is much faster
Is this sematic segmentation because all the cars have same colour
No, it's instance segmentation. I set consistent colors for each class
Inference speed? Can we use it in real-time?
It's super slow.
For realtime check: YOLACT
Can it calculate trajectory and velocity?
I guess this is still image processing applied to every frame of the video, so the results are not being correlated between frames. Compared to this, trajectory and velocity estimation once the objects have been detected and classified seems an easier task to me.
Can you tell me about hardware requirement for this?
NVIDIA GPU. With 980M it works pretty slow. Now you can find faster methods. This was below 1fps as I remember
Boa noite,
Caso possa adicionar as legendas em inglês eu consigo configurar a tradução automática para português e acompanhar com mais facilidade.
Eu tenho aprimorado o meu inglês, mas ainda tenho dificuldades em alguns termos e fico com a sensação de estar perdendo algo importante.
Você tem uma extensa gama de materiais excelentes, então se puder fazer a adição de legendas neles, seria de enorme ajuda.
Mais uma vez parabéns pelo excelente trabalho.
Grato,
Hello what is your gpu and cpu?
980m
Myślisz że jest szansa implementacji tego na raspberry? Jakiego hardware to wymaga?
Na RPi już same boxy działają wolno. Zobacz Tencent NCNN. Zobacz NanoDet (jutro film będzie tu)
Przy okazji. Znasz DeepDrive.pl/30
Trochę filmów o sieciach + parę miesięcy maili o Deep Learningu
Hello, this video is so nice! Could I use this video for my youtube channel? If it is okay, then I will put the link of the video on description box.
Yes, ok. Can you put here the link to your video so I can share it in a post?
@@KarolMajek Thank you so much :)) ok, I will.
@@KarolMajek czcams.com/video/E-52wXmG-1Q/video.html
3:17 - 3:25 I used the clip from this video for my youtube channel! Hope you enjoy this video!!
can i see the code to this project. your repository only have gifs and base code
Are you looking for code for inference on video?
I was modyfying this demo notebook github.com/karolmajek/Mask_RCNN/blob/master/demo.ipynb
@@KarolMajek but how did u read video, frame by frame or did u use any other method? And any idea on how to use imagenet weights and dataset instead of Coco??
Around 10:00 there are moments the mask rCNN doesn't recognize the truck right in front. What may cause it?
It's not trained to recognize it.
@@jerelvelarde2829 But if it's not trained to recognize it, why does it recognize it in other frames?
It happens when the accuracy of detection falls below a given threshold which is set as criteria to display. And in continuous videos only we find such glitches where the frame might recognise it as 2 objects with less confidence in each.
Could be because the tires are occluded and because of the lack of reflective surface/depth
Hi, there. Good job for this! I'm really surprised!
By this day, I'm working with Mask RCNN, but I could not figure out how to implement a live video, would give me a tip or something? I would really appreciate it! Thanks anyway!
Julio Rivera basically I am doing prediction for every frame and writing result to file. Then I am using ffmpeg to create a video. Of course you can use camera instead of mp4 file thanks to opencv.
Thank you!
Julio Rivera if you will need some help, feel free to ask
Karol Majek Cool! Do you have an email? Mine is juliorivera.rivas2013@gmail.com
You are the man, haha
Julio Rivera karolmajek@gmail.com
Which FPS did you achieve?
Super slow, seconds per frame. For faster inference check detectron2 and yolact
interesting the system picked up cars in the reflection of windows
source: github.com/karolmajek/Mask_RCNN
please tell me how to work and install to proper steps.
because some error i am facing.
please help me.
It can't seem to decide if motorcycles are cars or people? On a highway, I guess it doesn't matter in an accident....very little protection. Motorcycles are sometimes more unpredictable than cars and therefore, higher danger....It's a Bird... It's a Plane... It's a Boat.... It's Superman!
Very nice job, Could you please add code example for IP camera live RTSP streaming on github? For static counting cars usage.
You will not get live detections since mask RCNN is pretty slow. You can use opencv to receive such stream. Then just pass the image through net and you will see your results
@@KarolMajek I use ROCM with Sapphire NITRO+ RX 580 8 GB and no issue with live detection from webcam, i just need to figure out how to do it with IP cam.
Ok. Just use opencv camera capture with url as input
@@KarolMajek i did that but after a few sec it crashes with this: [rtsp @ 0x7f762b264700] RTP: PT=60: bad cseq 4e86 expected=30c3 i think is a FFmpeg bug.
Can be. I don't have solution for that problem
This is not live, right?
Yes, it's super slow. Check my newest video, there is something much more online
Source video not available
It's THE DAY - I finally update the link in desc:
Input 4K video: [NEW LINK!!!]
archive.org/details/0002201705192
I'm always asking my studient what we can do with that ? It's beautiful, nice, etc. but why
Of many possibilities, it can be used as a vision system for self driving cars!
Terminators will use this to identity Humans and Machines.
inspired by bigpackets
Reported for walls, enjoy your vac
where is it? turkey?
Warsaw, Poland
8:29 “The flag is also a kite”
Hi carol can u send me the code ?
Please send me @ sarwo.jowo@gmail.com
look in description - github.com/karolmajek/Mask_RCNN
what is the fps
Really low. Seconds per frame is a better metric
Check Detectron2 or YOLACT
@@KarolMajek Thanks for reply, I did train YOLACT, accuracy is nothing close to Maskrcnn
@@KarolMajek I actually just need a real-time detector, in my scenario there is a lot of overlapping objects, do you have any recommendations for this task ?
@@fetullahatas3927 then check mask rcnn in detectron2
code
牛逼
skynet
esp irl
in real life wall hacks?
Nice aimbot
This vide was sponsored by the rich people with awesome GPU gang.
3yo laptop with GTX980, nothing special
I can send a gift, but Paypal blocked in here(Turkey), I can send over Patreon.
Thank you, what about Revolut?
I had created patreon, let me check.
Thanks!
www.patreon.com/karolmajek
nice try