JETSON AI LAB | Realtime Video Vision/Language Model with VILA1.5-3b and Jetson Orin

Sdílet
Vložit
  • čas přidán 14. 05. 2024
  • Samples from running multimodal Efficient-Large-Model/VILA1.5-3b on video sequences using Jetson AGX Orin, captured at the live rate.
    Tutorial & Benchmarks: www.jetson-ai-lab.com/tutoria...
    JetPack Containers: github.com/dusty-nv/jetson-co...
  • Věda a technologie

Komentáře • 13

  • @mattmcpherson353
    @mattmcpherson353 Před měsícem +6

    This would be pretty cool for a security camera feed. If it could tell the difference between someone walking and some breaking into a vehicle, it would be killer feature.

    • @clray123
      @clray123 Před měsícem

      Yes also for a government to spy on people they dont like, blackmail them later or put them in prison outright for all their "evil deeds" that the AI helped you to hallucinate.

  • @DeathTempler
    @DeathTempler Před měsícem +1

    *Continues cramming everything into my Orin Nano*

  • @ronilevarez901
    @ronilevarez901 Před 20 dny +1

    Very fast indeed, but tbh, I'm not impressed. I would be if the AI had temporal awareness. Getting descriptions like "the bears keep walking", "a new bear enters the scene" or "the same fire keeps going", that would be awesome.
    Of course, these days I know it's only a matter of time, and compute.

  • @Hunger53
    @Hunger53 Před měsícem +2

    Impressive.

  • @jryde421
    @jryde421 Před 27 dny

    I can see this being used in helicopters during highspeed police chases.
    Linked to gps and the police cruisers to know locations and trajectories

  • @djp1234
    @djp1234 Před měsícem +1

    Did we create this by solving all of those CAPTCHA puzzles?

    • @ONDANOTA
      @ONDANOTA Před měsícem

      if a Captcha is solved and the solution is accepted that means the website already knew the correct answer

  • @cleisonarmandomanriqueagui9176

    Is it fast because of cloud services and GPU ? And how they improve latency ? many questions

    • @dusty-nv
      @dusty-nv Před 29 dny

      @cleisonarmandomanriqueagui9176 this is running locally on a Jetson embedded board that you can deploy into the field onboard robot, smart camera, IoT device, ect without network or cloud connection. We have optimized the open-source multimodal stack and models for realtime streaming performance.

  • @lucface
    @lucface Před měsícem

    layman's terms?

    • @lucface
      @lucface Před měsícem +1

      Ohh i see. I thought for a sec it was text to video and my mind started to melt.

  • @clray123
    @clray123 Před měsícem +1

    Police states all around the world, rejoice!