JETSON AI LAB | Realtime Video Vision/Language Model with VILA1.5-3b and Jetson Orin
Vložit
- čas přidán 14. 05. 2024
- Samples from running multimodal Efficient-Large-Model/VILA1.5-3b on video sequences using Jetson AGX Orin, captured at the live rate.
Tutorial & Benchmarks: www.jetson-ai-lab.com/tutoria...
JetPack Containers: github.com/dusty-nv/jetson-co... - Věda a technologie
This would be pretty cool for a security camera feed. If it could tell the difference between someone walking and some breaking into a vehicle, it would be killer feature.
Yes also for a government to spy on people they dont like, blackmail them later or put them in prison outright for all their "evil deeds" that the AI helped you to hallucinate.
*Continues cramming everything into my Orin Nano*
Very fast indeed, but tbh, I'm not impressed. I would be if the AI had temporal awareness. Getting descriptions like "the bears keep walking", "a new bear enters the scene" or "the same fire keeps going", that would be awesome.
Of course, these days I know it's only a matter of time, and compute.
Impressive.
I can see this being used in helicopters during highspeed police chases.
Linked to gps and the police cruisers to know locations and trajectories
Did we create this by solving all of those CAPTCHA puzzles?
if a Captcha is solved and the solution is accepted that means the website already knew the correct answer
Is it fast because of cloud services and GPU ? And how they improve latency ? many questions
@cleisonarmandomanriqueagui9176 this is running locally on a Jetson embedded board that you can deploy into the field onboard robot, smart camera, IoT device, ect without network or cloud connection. We have optimized the open-source multimodal stack and models for realtime streaming performance.
layman's terms?
Ohh i see. I thought for a sec it was text to video and my mind started to melt.
Police states all around the world, rejoice!