How to Build App with OpenAI's New GPT-4 TURBO VISION API (gpt vision)
Vložit
- čas přidán 14. 11. 2023
- @OpenAI has recently launched its latest API, GPT-4 Turbo, now with vision capabilities. This video presents a demonstration of the API's functionality within a demo app built using @databutton
Release Notes - openai.com/blog/new-models-an...
OpenAI docs - platform.openai.com/docs/over...
Databutton - databutton.com/login?...
App to test - databutton.com/v/now2nem0
Code - github.com/avrabyt/GPT4-turbo...
Publications shown in the video
- doi.org/10.1093/plphys/kiac175
- www.nature.com/articles/s4301...
Videos to watch
Stream Chat responses - • How to Stream LangChai...
RAG Chatbot - • Build your own RAG (re...
Learn more about Databutton - • Build a PERSONAL CHATB...
Playlist - • I asked ChatGPT to bu...
Blogs to read ( / avra42 )
Streaming response - / stream-langchain-ai-ab...
Building RAG for next AI - / why-your-next-ai-produ...
#openai #gpt4 #gpt4vision #gpt4v - Věda a technologie
Very helpful. Well done!
Glad to hear 🫶🏽
bro i am getting error that "gpt-4-vision-preview" model is not available give me a solution for this
Hello. I want to use Chatgbt4 Turbo vision for my application however I am not sure about the charges I am paying the way of calculation is very confusing to me. Does anyone know for sure what is paid on Azure open ai for using the Chatgbt 4 Turbo vision model, is it just spent tokens or something extra? Thank you
Very interesting! It would be nice if there was less zooming in and out though. Anyways, the content itself was great! Nice demo.
Hey! Thanks for the feedback. I'm glad that you liked the content 🫶🏽
Yes, I completely agree with you. I'm trying to improve over time my production quality - from mic to overall video editing.
All this time, CZcams and sharing was more of a mere side hobby, but since I can see that it's growing and people do care such work - I would definitely concentrate not only the content but the overall production quality :)
Do keep in touch and share such feedbacks , cheers !
Works great , tried with handwritten notes and images 😄
Hahaha ! That’s pretty cool 😅
Thank you!
🫶🏽
Quick question! Do you add motion blur when you are panning through the broswer? Curious if that’s part of your editing technique😊
Hahah nope ! All I do is using an app called screenstudio . Found it super cool . But seems like I’m over using it at times ( zoom in and out ) 😅
Can I do same with JS or Php
Hi Avra,
I'm currently working on a project that involves using the Vision Open API. However, I've encountered an issue with counter questions where OpenAI doesn't recognize older messages. I discovered a solution, which involves sending all messages in context repeatedly, but this increases the token size every time I ask a question. Could you please suggest or help me with this issue?
Correct . That’s not shown here .
What you can do is , pass a summary of the older context. More like a moving average . At every conversation you retain a certain portion of the old summary and pass it . I remember LangChain implemented something similar . And that’s a common practice to save cost as well. However, with larger context size now , you can literally pass a lot of tokens. But yeah that increase the cost
WOW, Pure Value - Quick Question: How do you edit your videos? any Tutorials / Course?
Thanks ! Glad to hear . I use a screen studio to record and edit .
Plus iMovies for the final edit . I would say it’s pretty minimal and I usually don’t invest much time on it .
it says cant import openai from openAI . Even tho I have installed the latest version of openai
Are you sure using the latest open ai endpoints. If I’m not wrong, this video was made with the old endpoints. They changed a month or two later.
The way openai client works is bit different now . Would suggest to check out their docs.
HI Avra,Can we read text from images using ai model gpt-4 vision preview
Hi 👋🏽yes. It is able to read, understand and interpret the images / texts from the screenshots.
can u develop tool in streamlit by reading text from images using vision api from AI
Yes. The app is built in Databutton ( using the streamlit front end code ?
Is it possible to add an image in the output (not just text) ?
Probably text to image ? That might work .
cool video, can you please advise where I can get access to the apis for gtp4 vision. Because the assistant models are unable to read images or pdfs
If I’m not wrong - You need to have a paid Open AI API tier
@@Avra_b Yes I do have paid Open Ai api tier. But when I upload documents it says, it does not have vision capabilities. There are architectural workarounds, however I wanted take your expert opinion to see if I am missing something
Can we input a video(without voice over) and get a summary of what is depicted?
Yes it’s very much possible . I did try it out . In that case you need to pass the image frame by frame to the GPT vison API. And most importantly, it works well with a low resolution video file ( that’s what I found )
wow, this was great tutorials, thankss
is there any way to make it work with multiple images and make the output in csv files ?
Did you manage to achieve that? I'm also interested
@@dukemarius8691 not yett
Sir have you used any Third party for this or its purely from GPT-4
GPT 4 !
Please accept my request on linkdin sir
I have some issues to ask on OpenAI sir please
I have tested out the vision API, but it is crazy expensive, just in a few testing rounds I burned through a few dollars and it was maybe like 20-30 screen shots. I can't see how it could be economical to build an app using this.
Great point! I do agree with you here . But my guess, particularly after seeing how dynamic the field of Generative AI has become, I’m expecting the price of such models ( specially the multi modal ) would get cheaper in coming months 🤞🏽
@@Avra_b Yeah I just like to keep up with what is happening. I actually used it with the new self-operating-computer repo going around on GitHub. It basically just keeps taking snapshots and uploading them to vision and then tries to analyze where to click. It is a really cool idea and I can see it being very useful in the future, but currently it doesn't really work very well and it ends up being really expensive because it needs to upload screen shot for every little step.
In future, Can you please focus more on alternatives?
Thank you
Sure. What would you suggest ?
How to use it for free?
You mean the API ? Unfortunately it isn’t . Open AI needs paid tier .