How to Build App with OpenAI's New GPT-4 TURBO VISION API (gpt vision)

Sdílet
Vložit
  • čas přidán 14. 11. 2023
  • ‪@OpenAI‬ has recently launched its latest API, GPT-4 Turbo, now with vision capabilities. This video presents a demonstration of the API's functionality within a demo app built using ‪@databutton‬
    Release Notes - openai.com/blog/new-models-an...
    OpenAI docs - platform.openai.com/docs/over...
    Databutton - databutton.com/login?...
    App to test - databutton.com/v/now2nem0
    Code - github.com/avrabyt/GPT4-turbo...
    Publications shown in the video
    - doi.org/10.1093/plphys/kiac175
    - www.nature.com/articles/s4301...
    Videos to watch
    Stream Chat responses - • How to Stream LangChai...
    RAG Chatbot - • Build your own RAG (re...
    Learn more about Databutton - • Build a PERSONAL CHATB...
    Playlist - • I asked ChatGPT to bu...
    Blogs to read ( / avra42 )
    Streaming response - / stream-langchain-ai-ab...
    Building RAG for next AI - / why-your-next-ai-produ...
    #openai #gpt4 #gpt4vision #gpt4v
  • Věda a technologie

Komentáře • 45

  • @SimonCarpio09
    @SimonCarpio09 Před 7 měsíci +1

    Very helpful. Well done!

    • @Avra_b
      @Avra_b  Před 7 měsíci

      Glad to hear 🫶🏽

  • @shaikmohammedjafarhussain8400
    @shaikmohammedjafarhussain8400 Před 4 měsíci +1

    bro i am getting error that "gpt-4-vision-preview" model is not available give me a solution for this

  • @nclub976
    @nclub976 Před 6 měsíci

    Hello. I want to use Chatgbt4 Turbo vision for my application however I am not sure about the charges I am paying the way of calculation is very confusing to me. Does anyone know for sure what is paid on Azure open ai for using the Chatgbt 4 Turbo vision model, is it just spent tokens or something extra? Thank you

  • @Ro1andDesign
    @Ro1andDesign Před 8 měsíci +3

    Very interesting! It would be nice if there was less zooming in and out though. Anyways, the content itself was great! Nice demo.

    • @Avra_b
      @Avra_b  Před 8 měsíci +2

      Hey! Thanks for the feedback. I'm glad that you liked the content 🫶🏽
      Yes, I completely agree with you. I'm trying to improve over time my production quality - from mic to overall video editing.
      All this time, CZcams and sharing was more of a mere side hobby, but since I can see that it's growing and people do care such work - I would definitely concentrate not only the content but the overall production quality :)
      Do keep in touch and share such feedbacks , cheers !

  • @gauravgarg-wc4zl
    @gauravgarg-wc4zl Před 8 měsíci +2

    Works great , tried with handwritten notes and images 😄

    • @Avra_b
      @Avra_b  Před 8 měsíci

      Hahaha ! That’s pretty cool 😅

  • @marcociofalo
    @marcociofalo Před 8 měsíci +3

    Thank you!

  • @designthinkingwithgian
    @designthinkingwithgian Před 8 měsíci +1

    Quick question! Do you add motion blur when you are panning through the broswer? Curious if that’s part of your editing technique😊

    • @Avra_b
      @Avra_b  Před 8 měsíci

      Hahah nope ! All I do is using an app called screenstudio . Found it super cool . But seems like I’m over using it at times ( zoom in and out ) 😅

  • @pratikmanusmare7837
    @pratikmanusmare7837 Před 2 měsíci

    Can I do same with JS or Php

  • @rupeshparmar4916
    @rupeshparmar4916 Před 3 měsíci +1

    Hi Avra,
    I'm currently working on a project that involves using the Vision Open API. However, I've encountered an issue with counter questions where OpenAI doesn't recognize older messages. I discovered a solution, which involves sending all messages in context repeatedly, but this increases the token size every time I ask a question. Could you please suggest or help me with this issue?

    • @Avra_b
      @Avra_b  Před 3 měsíci

      Correct . That’s not shown here .
      What you can do is , pass a summary of the older context. More like a moving average . At every conversation you retain a certain portion of the old summary and pass it . I remember LangChain implemented something similar . And that’s a common practice to save cost as well. However, with larger context size now , you can literally pass a lot of tokens. But yeah that increase the cost

  • @digitalmarketingwithpiyush
    @digitalmarketingwithpiyush Před 8 měsíci +1

    WOW, Pure Value - Quick Question: How do you edit your videos? any Tutorials / Course?

    • @Avra_b
      @Avra_b  Před 8 měsíci

      Thanks ! Glad to hear . I use a screen studio to record and edit .
      Plus iMovies for the final edit . I would say it’s pretty minimal and I usually don’t invest much time on it .

  • @deekshanayak895
    @deekshanayak895 Před 5 měsíci +1

    it says cant import openai from openAI . Even tho I have installed the latest version of openai

    • @Avra_b
      @Avra_b  Před 5 měsíci

      Are you sure using the latest open ai endpoints. If I’m not wrong, this video was made with the old endpoints. They changed a month or two later.
      The way openai client works is bit different now . Would suggest to check out their docs.

  • @kopparthi44
    @kopparthi44 Před 7 měsíci +1

    HI Avra,Can we read text from images using ai model gpt-4 vision preview

    • @Avra_b
      @Avra_b  Před 7 měsíci

      Hi 👋🏽yes. It is able to read, understand and interpret the images / texts from the screenshots.

  • @kopparthi44
    @kopparthi44 Před 7 měsíci +1

    can u develop tool in streamlit by reading text from images using vision api from AI

    • @Avra_b
      @Avra_b  Před 7 měsíci

      Yes. The app is built in Databutton ( using the streamlit front end code ?

  • @juanpamantelli
    @juanpamantelli Před 4 měsíci +1

    Is it possible to add an image in the output (not just text) ?

    • @Avra_b
      @Avra_b  Před 4 měsíci

      Probably text to image ? That might work .

  • @MultiTwist
    @MultiTwist Před 6 měsíci

    cool video, can you please advise where I can get access to the apis for gtp4 vision. Because the assistant models are unable to read images or pdfs

    • @Avra_b
      @Avra_b  Před 5 měsíci

      If I’m not wrong - You need to have a paid Open AI API tier

    • @MultiTwist
      @MultiTwist Před 5 měsíci

      @@Avra_b Yes I do have paid Open Ai api tier. But when I upload documents it says, it does not have vision capabilities. There are architectural workarounds, however I wanted take your expert opinion to see if I am missing something

  • @purplefan204
    @purplefan204 Před 7 měsíci +1

    Can we input a video(without voice over) and get a summary of what is depicted?

    • @Avra_b
      @Avra_b  Před 7 měsíci

      Yes it’s very much possible . I did try it out . In that case you need to pass the image frame by frame to the GPT vison API. And most importantly, it works well with a low resolution video file ( that’s what I found )

  • @ghazy095
    @ghazy095 Před 3 měsíci +1

    wow, this was great tutorials, thankss
    is there any way to make it work with multiple images and make the output in csv files ?

    • @dukemarius8691
      @dukemarius8691 Před 2 měsíci

      Did you manage to achieve that? I'm also interested

    • @ghazy095
      @ghazy095 Před 2 měsíci

      @@dukemarius8691 not yett

  • @mallanagoudabiradar1922
    @mallanagoudabiradar1922 Před 6 měsíci +1

    Sir have you used any Third party for this or its purely from GPT-4

  • @davidallred991
    @davidallred991 Před 7 měsíci +2

    I have tested out the vision API, but it is crazy expensive, just in a few testing rounds I burned through a few dollars and it was maybe like 20-30 screen shots. I can't see how it could be economical to build an app using this.

    • @Avra_b
      @Avra_b  Před 7 měsíci +1

      Great point! I do agree with you here . But my guess, particularly after seeing how dynamic the field of Generative AI has become, I’m expecting the price of such models ( specially the multi modal ) would get cheaper in coming months 🤞🏽

    • @davidallred991
      @davidallred991 Před 7 měsíci

      @@Avra_b Yeah I just like to keep up with what is happening. I actually used it with the new self-operating-computer repo going around on GitHub. It basically just keeps taking snapshots and uploading them to vision and then tries to analyze where to click. It is a really cool idea and I can see it being very useful in the future, but currently it doesn't really work very well and it ends up being really expensive because it needs to upload screen shot for every little step.

  • @rakeshkr3924
    @rakeshkr3924 Před 8 měsíci +1

    In future, Can you please focus more on alternatives?
    Thank you

    • @Avra_b
      @Avra_b  Před 8 měsíci

      Sure. What would you suggest ?

  • @skaltamash6281
    @skaltamash6281 Před 8 měsíci +3

    How to use it for free?

    • @Avra_b
      @Avra_b  Před 8 měsíci

      You mean the API ? Unfortunately it isn’t . Open AI needs paid tier .