GPT-4 Vision API + Puppeteer = Easy Web Scraping

Sdílet
Vložit
  • čas přidán 27. 07. 2024
  • In today's video I do some experimentation with the new GPT-4 Vision API and try to scrape information from web pages using it.
    GitHub: github.com/unconv/gpt4v-browsing
    Support: buymeacoffee.com/unconv
    Consultations: www.buymeacoffee.com/unconv/e...
    Memberships: www.buymeacoffee.com/unconv/m...
    00:00 Intro
    01:04 Basic usage of GPT-4 Vision API
    05:50 Test GPT-4 Vision with image from Unsplash
    07:23 Taking a screenshot with Puppeteer
    12:35 Test GPT-4 Vision with Wikipedia screenshot
    18:14 Test GPT-4 Vision with Google weather info
    19:29 Automating URL generation + screenshot taking
    33:24 Handling timeouts and retries and making it conversational
    44:30 Summarizing BBC news
    45:33 Fixing slow loading pages
    49:18 Asking for weather information
    50:24 Tweaking system message
    54:03 Asking for Tesla stock price
    56:00 Outro
  • Věda a technologie

Komentáře • 155

  • @unconv
    @unconv  Před 8 měsíci +10

    Almost 20K views 😳 Part 2: czcams.com/video/PMLg6Rr8fcU/video.html

    • @surya0202
      @surya0202 Před 7 měsíci

      please upload part 2 sir

    • @artemisfauls
      @artemisfauls Před 3 měsíci

      I believe this method can be used to automate certain routine processes, but only if the price of gpt4v is reasonable. For example, you need to send 10,000 screenshots with a resolution of 1920x1080 pixels to gpt4v in 1 day - how much will it cost?🤔🤓

  • @mooktakim
    @mooktakim Před 8 měsíci +3

    This guy has superpowers. He can talk and code at the same time!

  • @Lewis64
    @Lewis64 Před 8 měsíci +17

    Didn’t expect a coding video to be this entertaining. Love the frank display of your thought process.

  • @dustinsoodak8954
    @dustinsoodak8954 Před 7 měsíci +5

    I love how much of the process of programming he includes in the demo

  • @arc0life
    @arc0life Před 8 měsíci +12

    Your tutorial helps with the excitement and anxiety as a fellow dev. I knew I could do this myself but keep procrastinating and eventually some tasks end up as a mental block in WFH mode. Just forcing myself to watch a fella do something like this really helps, thank you!

  • @Autoscraping
    @Autoscraping Před 6 měsíci +1

    A fabulous video that has been of great help in orienting our new collaborators. Your generosity is highly valued!

  • @gaming_for_sanity
    @gaming_for_sanity Před 7 měsíci +3

    Legend has it, he’s still trying to find out what the weather is like in Alaska…

  • @PostMeridianLyf
    @PostMeridianLyf Před 8 měsíci

    Its interesting that this is exactly what I was looking for. Llast night i spent a few hours asking copilot how to implement the same libraries. Thanks for the tutorial

  • @Salfie007
    @Salfie007 Před 8 měsíci +2

    I just wanted to tell you that you are doing great and I really like your format.

    • @unconv
      @unconv  Před 8 měsíci

      Thank you very much!

  • @fuba44
    @fuba44 Před 8 měsíci +4

    This was super cool! Don't mind the long format at all. Would love to see you evolve this concept in another video.

    • @unconv
      @unconv  Před 8 měsíci +3

      I've already filmed the next one. It'll definitely be long form 😅

    • @amritbanerjee
      @amritbanerjee Před 8 měsíci

      With full page screenshots. Maybe create an assistant which looks at my bookmarks and the tags in there based on my question and tries to get me the info from the page.

  • @robbennett6053
    @robbennett6053 Před 6 měsíci +1

    Seriously impressive. I'm a NodeJS API engineer and you're writing that JS code faster than me!

    • @unconv
      @unconv  Před 6 měsíci

      Thanks! Fast doesn't equal good, though 😅

  • @reuna4c3
    @reuna4c3 Před 8 měsíci

    This is so cool and nerdy! Maybe the best site to follow and learn more and more on OpenAI API. Difficult but entertaining to follow.

  • @gmichael5506
    @gmichael5506 Před 5 měsíci +1

    Really appreciate your information and style. Learning much!

    • @unconv
      @unconv  Před 5 měsíci

      Thanks for watching!

  • @thecount25
    @thecount25 Před 8 měsíci +5

    Use the retry library and set a low timeout; you can use a simple decorator. If the timeout needs to be high and this isn't very pleasant, consider running multiple requests concurrently and waiting only on the first result.

  • @cutecute9189
    @cutecute9189 Před 8 měsíci +1

    This is awesome. I love your videos. Please keep these videos going specially this one. I learned so much

    • @unconv
      @unconv  Před 8 měsíci

      Thank you! More to come :)

  • @ScootLogix
    @ScootLogix Před 8 měsíci

    Great video dude. Im gonna rewatch later. I got a project this might help on.

  • @marcoaerlic2576
    @marcoaerlic2576 Před 2 měsíci

    Thanks for the video. Great work.

  • @grant_vine
    @grant_vine Před 8 měsíci +3

    So for cookies you just need to know what cookie is being set, in many cases it’s likely just a matter of causing the same effect in puppeteer, one way is to add to the cookie store directly (I’m sure puppeteer has a way to do this), and an alternative is specifying a “user directory” for puppeteer so you can actually agree to things like cookies, in many ways consent popups are easy to “locate” using standard html locators simply because it is often set to a priority load event and is often a div/container with a name/id containing the word consent or cookie etc, so regex can be used to find these reasonably easy. Use puppeteer to locate the “Ok” button and click it and then having that reusable user directory means you only check for any site if you have or haven’t accepted consent, if not click it if so just scrape it

  • @albertwang5974
    @albertwang5974 Před 5 měsíci

    very interesting, thanks for sharing!

  • @edoardogribaldo2870
    @edoardogribaldo2870 Před 6 měsíci

    Crazy good content! Thank you!

  • @digitalcivilulydighed
    @digitalcivilulydighed Před 8 měsíci +2

    I'd like to see a video from you about navigating websites with Puppeteer. Now that you ask, I'd like a tutorial on how it follows links, fills out data, crawls four or more links deep into a website, how to handle session cookies, automate and run loops, etc. :-)

  • @mysticminds1126
    @mysticminds1126 Před 7 měsíci

    I appreciate your efforts mannn...

  • @Laowater
    @Laowater Před 5 měsíci

    a Master in the Arts of coding!

  • @pourkin
    @pourkin Před 7 měsíci

    Excellent Job

  • @mt4u832
    @mt4u832 Před 29 dny

    Very clever. Congratulation

  • @nathanl6598
    @nathanl6598 Před 6 měsíci

    No typescript and no copilot? This was a more wholesome time.

  • @chromashift
    @chromashift Před 5 měsíci

    BWAHAHAHAHA! the struggle (programming: errors = WTF!!!!) is real.
    day in the life of code building...Awesome video!

  • @dreamphoenix
    @dreamphoenix Před 8 měsíci

    Thank you.

  • @yoyartube
    @yoyartube Před 7 měsíci

    Great Video! Can these libraries handle auth like azure oauth flow in order to browse to the page?

  • @gianmarcoferrara3397
    @gianmarcoferrara3397 Před 7 měsíci

    You should try the JSON response mode. You can request to return a response like that in the system promp: {data: ExpectedDataInterface, error: ErrorInterface | null}. Good luck!

  • @guitaripod
    @guitaripod Před 7 měsíci

    Cool video

  • @iceshoqer
    @iceshoqer Před 6 měsíci

    Chain of thought is actually meant to be used for mostly information accuracy, not for fixing what you could do in a proper single prompt.

  • @Y3llowMustang
    @Y3llowMustang Před 6 měsíci

    I wouldn't call this easy web scraping, but this was very hilarious with all the bugs

  • @splashelot
    @splashelot Před 7 měsíci

    For getting Sam Altman's age, would it help if you stated that the screenshot is taken today? ChatGPT may be hesitant to assume this.

  • @louisbertson
    @louisbertson Před 6 měsíci

    A good way is to include in user role message a timestamp. It will help him calculate the age of SAM Altaman easily!

    • @unconv
      @unconv  Před 6 měsíci

      Yes, but only because he knows his birthday already (even without the Wikipedia screenshot)

  • @mohamedbasueny9476
    @mohamedbasueny9476 Před 7 měsíci

    i was wondering how this is different from the web-search capapblilty of chatgpt-plus right now .
    in other words , if i asked gpt to look for an answer on the web will it struggle to do so ? ,
    is this a hack way to use a better websearch via an api like method because it's not enabled yet in the openai dev tools .
    any way i really like the video , can we use selenuim to do so also ?

  • @silva8215
    @silva8215 Před 8 měsíci

    Couldn't you use backoff to handle the error when the API is stuck?

  • @alon7110
    @alon7110 Před 8 měsíci

    Thank you for this helpfull video! can you please try the same task with the functions tool? Thanks!

  • @andrejuntermanns7660
    @andrejuntermanns7660 Před 7 měsíci

    I dont get the plus in funcionality compared to google in this demo. Help me out.

  • @billybofh2363
    @billybofh2363 Před 8 měsíci +1

    A little speed up might be to use the python requests package to try and fetch the url first before running puppeteer - then short-circuit invalid domains, 404's etc? Also, when doing a completion you can pass `request_timeout=10` or whatever and it'll kill the call. Sometimes even works.... ;-)

    • @unconv
      @unconv  Před 8 měsíci

      Thanks, I'll try that. Yeah, you can set the request_timeout, but you still have to handle the error my having some recursive function that retries the request if it fails. And I don't have time to implement that. It would take like a minute, lol

    • @billybofh2363
      @billybofh2363 Před 8 měsíci +1

      I replied to this and youtube removed it (I think!) - but the python package 'tenacity' (or the original retry) is worth a look (I'll skip the url as I think that's what made youtube remove/hide my comment)

  • @ntgCleaner
    @ntgCleaner Před 6 měsíci

    I'm only up to 15:00 but the issue you had up at this point is that it CAN read sam altman's birthdate, but it doesn't know what the date is today. You can feed it the date in your response generated with `date()` or whatever.

  • @AlfonsoMenkel
    @AlfonsoMenkel Před 8 měsíci +1

    Grate video, at last I see on YT someone that struggles with the API as I do…
    I know the topic of the video is to use the vision api, but you cold get better results using a terminal web browser like lynx , piping the result to a Tex file and asking ChatGPT with that text as context.
    Just an idea. 😉

    • @unconv
      @unconv  Před 8 měsíci

      I was gonna dismiss your suggestion by saying one does not simply use Lynx in 2023 since it doesn't support JavaScript, which many websites require nowadays. But testing it out just now, all the examples I showed in this video could have worked with Lynx (based on its output). I don't know how I would extract links and input fields with Lynx, though, to make it crawl subpages. Perhaps all those pages were server side rendered, so I might as well have used Curl.

  • @TarasKim
    @TarasKim Před 6 měsíci

    If you add something like "Strictly based on the information from screeshot" you get information based on the information he gets from screenshot.

  • @zeta_meow_meow
    @zeta_meow_meow Před 6 měsíci

    just kisses for you , so freaakin loved how you explained and debugged along us

  • @EduardsRuzga
    @EduardsRuzga Před 8 měsíci +4

    Great video!
    Interesting experiments with the GPT Vision API and Puppeteer. I have a couple of questions and a suggestion:
    1. Could you share some insights on the cost aspect of using the GPT Vision API for this project? I'm curious about the pricing and whether it's feasible.
    Also, have you considered combining classical web scraping methods with the Vision API in a synergistic way? Specifically, using traditional scraping to gather initial data and then employing the Vision API to verify or correct this data where needed. I think this could potentially address some of the limitations of both methods. What are your thoughts on this approach?Looking forward to hearing your thoughts!

    • @unconv
      @unconv  Před 8 měsíci +6

      Thanks! On the day I filmed the video, my API costs were $0.58. The next day I maxed out the limit of 100 messages of the gpt-4-vision-preview while testing and the total cost for that day was $2.15. These costs include some other API calls as well, though.
      Combining classical web scraping and Vision API seems like a good idea. I'll have to look into that when I run into an issue scraping something.

  • @Tyfeen
    @Tyfeen Před 7 měsíci

    what is the weather like in alaska?

  • @evanlovett3553
    @evanlovett3553 Před 6 měsíci

    What is the weather like in Alaska?

  • @LearnCode_withAI
    @LearnCode_withAI Před 7 měsíci

    In package.json yku can set type : module

  • @RonivaldoPassosSampaio
    @RonivaldoPassosSampaio Před 8 měsíci +3

    Nice content, but you should just copy paste the code, we know you can code well behind the scenes, don't worry. Keep doing great!

  • @bkentffichter
    @bkentffichter Před 7 měsíci

    I made a drinking game out of the word Alaska. I died.

  • @alexeygrom1834
    @alexeygrom1834 Před 7 měsíci

    "In Alaska's land, where coders seek the weather's tale,
    They type and query, 'neath the aurora's bright veil.
    With every line of code, they ask the sky's mood,
    Hoping for sunshine, but prepared for the cold and brood."

  • @8COOL6
    @8COOL6 Před 6 měsíci

    but the token authorization for use gpt-4 preview where is ?

  • @murch5054
    @murch5054 Před 7 měsíci

    I see that 0420 there... in 00:31:50 : )

  • @terenceundbud
    @terenceundbud Před 6 měsíci

    so you need to use gpt 3.5 turbo to get exact answers ijnstead of gpt-4? weird.

  • @eyoo369
    @eyoo369 Před 8 měsíci +2

    The Vision API downsamples the image.. thats why it cannot recognise small fonts.

  • @PDragonLabs
    @PDragonLabs Před 4 měsíci

    👍

  • @TonyS1
    @TonyS1 Před 7 měsíci

    What is the current weather in the world?

  • @AuditorsUnited
    @AuditorsUnited Před 5 měsíci

    im not going to watch an hour of you coding but i will share that you can get a image of each element and selenium would problem be a good choice to use in this

  • @thr0w407
    @thr0w407 Před 6 měsíci

    The llm was wrong about what the light on the motorcycle means, since the headlight is ALWAYS on. A simple but important mistake.

  • @TheBeefiestable
    @TheBeefiestable Před 7 měsíci +1

    If humans didnt all re-invent the wheel every hour, there would be a huge database of every query : response : list of problems : links to solutions if they ever figured it out , that would save humanity unlimited man hours... but probably put openai out of business

  • @xsploit
    @xsploit Před 8 měsíci

    also i think this better suuited for assistants api. i made a private investigator that uses functions. one is serper api and if it finds a linkedin page crawls and de html it and send to get summzairzed with the link snippets ,then the other function is getting details on a image url you asked it to veiw using gpt 4 vision and i could make those functions paralell

    • @bogdanbogdan5276
      @bogdanbogdan5276 Před 8 měsíci

      Could you share more details, I'm trying to build similar functionality

  • @waneyvin
    @waneyvin Před 8 měsíci +1

    is it possible to use selenium? at least it is python, you don't need to switch between 2 language.

    • @unconv
      @unconv  Před 8 měsíci

      Yes, it should work too. I just have more experience with Puppeteer (never tried Selenium)

  • @OBRosewell
    @OBRosewell Před 5 měsíci

    hey man! I would need something like this posted onto a server of some sort, like AWS or Heroku. is that possible if i build this & deploy it? i need it to scale up for 1000 requests daily

    • @unconv
      @unconv  Před 5 měsíci

      A lot of websites will block requests from AWS servers, so you would probably need some sort of proxy server in between.

  • @virdvird
    @virdvird Před 8 měsíci

    Make screenshot (do not close puppeteer session) and ask chatGPT is page looks loaded or not instead of relying on networkidle0, timeout, etc

  • @erikaszvicevicius9191
    @erikaszvicevicius9191 Před 8 měsíci

    First thank You. And question - how much token used this scraping method?

    • @unconv
      @unconv  Před 8 měsíci +1

      I haven't checked exactly but it seems to be around $0.017 per scrape based on my API usage during building this

  • @TheChrisSoria
    @TheChrisSoria Před 6 měsíci

    I still don’t have access to the vision API : (

  • @joebazooks
    @joebazooks Před 8 měsíci

    I believe it’s not telling you his age because it is trying to provide you with a precise age i.e. his current age, given his birth date. Don’t ask what his age is, but what age the page or author of the page says he is

  • @HolyG2k6
    @HolyG2k6 Před 8 měsíci

    "Hopefully this is not a Malware" :D :D

  • @ddsmax
    @ddsmax Před 8 měsíci +1

    You're already in javascript for puppeteer. Why do the gymnastics of writing your main logic in python?

    • @unconv
      @unconv  Před 8 měsíci +1

      That's a good point and in the next video I in fact switch to JavaScript only. I prefer Python, though

  • @TeleV77_media
    @TeleV77_media Před měsícem

    also i had checkout a patreon chat ( paid ). but now i am just unable to find it? it is gone?+

    • @unconv
      @unconv  Před měsícem

      I'm not on Patreon but I'm on BuyMeACoffee and you can find a link in the description

    • @TeleV77_media
      @TeleV77_media Před měsícem

      @@unconv thankyou for the good job. i am improving and using it.
      there are some pieces that doens work up to today and fixed them

  • @PolinomPolynets
    @PolinomPolynets Před 8 měsíci

    Is there a reason you don't use copilot?

    • @unconv
      @unconv  Před 8 měsíci +3

      It often guides me to directions I don't want to go. Also, I'm still learning Python so I'd rather practice my memorization

  • @theoriginalrecycler
    @theoriginalrecycler Před 7 měsíci

    Remove the word Like, ask what is the weather in Alaska. The question you ask leads to an answer such as “colder than a commercial freezer”.

    • @unconv
      @unconv  Před 7 měsíci +1

      Good point 😂

  • @kamalkamals
    @kamalkamals Před 7 měsíci

    14:50 i don't think that's a good idea because u will lose a lot of tokens (input, output), so it s better to use scrapping urls with vector store

  • @uncleJuancho
    @uncleJuancho Před 7 měsíci

    Great video! However, I noticed a few instances where you mentioned not having prior experience with certain tasks, but then you later showcased projects where the code was already complete. For example, at 9:29 in the video. This seems a bit contradictory and might confuse some viewers

    • @unconv
      @unconv  Před 7 měsíci

      Thanks! Which tasks did I say I didn't have prior experience with?

    • @uncleJuancho
      @uncleJuancho Před 7 měsíci +1

      @unconv, this is my first time viewing a video on your channel. I observed that you started by looking through the documentation as if it was new to you, despite already having the answer in another file. This struck me as unusual, but I understand it might have been part of your process. When the documentation didn't seem to help, you referred to your existing project. I don't mean this in a negative way; it's just my personal observation from watching this video for the first time

    • @unconv
      @unconv  Před 7 měsíci +2

      I've used Puppeteer multiple times in the past, but I never remember the boilerplate stuff. I didn't want to jump directly to my own previously written code, because I want to do things from scratch in my videos, not leaving out any steps. And I want to show how I go about researching stuff. But I get that it might have been confusing - although I suspect even more confusing if I directly copy pasted my old code.

  • @User_1795
    @User_1795 Před 5 měsíci

    This could be the best kodi addon ever

  • @yolamontalvan9502
    @yolamontalvan9502 Před 7 měsíci

    What is 4 Vision API?

  • @sniegu84
    @sniegu84 Před 7 měsíci

    to have productive programming ai has to return what you want in 100% cases. it has to be better than human in deduction.

  • @xsploit
    @xsploit Před 8 měsíci

    i would just using the scraping way and dehtml it. ive never seen seen someone with so much problems calling api

  • @markw7609
    @markw7609 Před 7 měsíci

    Can this work for Instagram scraping ?

  • @chameeragamage1526
    @chameeragamage1526 Před 7 měsíci

    Want more

  • @avi7278
    @avi7278 Před 8 měsíci

    great video, just one suggestion, the repetition of what you're typing literally every time is a bit much.

    • @unconv
      @unconv  Před 8 měsíci

      Thanks! I'll try to avoid that in the future (and mistakes leading to repetition in general)

  • @MrCaovang
    @MrCaovang Před 8 měsíci +1

    Thank you for the Video.
    But the way you re-typing the question (instead of copy and paste it) make me frustrated 😖

    • @unconv
      @unconv  Před 8 měsíci

      Sorry about that 😄

  • @HaseebHeaven
    @HaseebHeaven Před 6 měsíci

    Why you mixed Python + JS i dont see an requirement you could single programming language, Java script, or Python, and simply executed the same task with the single project

  • @la6188
    @la6188 Před 8 měsíci

    Why not use everything in js? So confusing

  • @Alternativetips
    @Alternativetips Před 7 měsíci

    gpt4 vision api limits ?

    • @unconv
      @unconv  Před 7 měsíci +1

      100 requests per day

  • @aviralpatel2443
    @aviralpatel2443 Před 5 měsíci

    bro sounds like an AI. Good video tho

  • @cafeta
    @cafeta Před 8 měsíci +14

    Why aren't you using the AI to help you code?🤔🤷

    • @Bartskol
      @Bartskol Před 8 měsíci +9

      I think that he wants to explain the code to us by writing. I use ChatGPT to write code as I'm not a programmer myself, but I find myself learning to code anyway because I still need to understand what I actually need. It's also tiring to pass every small error to chat; it's easier to make adjustments yourself. However, to do that, you need to understand the code at some level.

    • @unconv
      @unconv  Před 8 měsíci +14

      I actually have Copilot but usually I disable it because it often guides me to directions I don't want to go. Especially when making videos, if Copilot suggests a different way than I was going to go, I get distracted. And I'm still learning Python, so I want to actually learn it. If I always use Copilot, I can get the job done but I probably won't memorize the syntax.

    • @-Jason-L
      @-Jason-L Před 7 měsíci +1

      ​@@unconvI think he meant let chatgpt generate the entire code, not copilot.

    • @yungjerky
      @yungjerky Před 6 měsíci

      Because fully AI generated code is unusable

    • @AIPulse118
      @AIPulse118 Před 5 měsíci

      ​@@yungjerkynot anymore it isn't. Never used Grimoire?

  • @Flameandfireclan
    @Flameandfireclan Před 8 měsíci

    Instant fork, all your code belong to us

  • @nitestrykerx01
    @nitestrykerx01 Před 7 měsíci

    Seems very inefficient to do it that way, yes it’s and interesting concept but you can do it all in Python and your logic can be simplified to get results

  • @MrDouglax
    @MrDouglax Před 7 měsíci

    it's an AI speaking?

  • @mnageh-bo1mm
    @mnageh-bo1mm Před 7 měsíci

    aaaaa this was frustrating as hell

  • @_nom_
    @_nom_ Před 8 měsíci

    It's not hard to make a scraper. In fact you probably only need to use a http request, not a full on instance of chrome.

  • @user-uw7st6vn1z
    @user-uw7st6vn1z Před 5 měsíci

    you would bankrupt if you use gp4 vision api scrap web.... just link your credit card and start scraping

  • @qasurfer
    @qasurfer Před 7 měsíci

    coding 😛

  • @hidroman1993
    @hidroman1993 Před 8 měsíci

    Using the seed as if it was a hyperparameter shows how little you know about the stuff you're talking about, congrats!

    • @unconv
      @unconv  Před 8 měsíci +2

      I mean, if you know more about it than me, you could maybe explain further or link to some more information about the subject

    • @ZweiBein
      @ZweiBein Před 7 měsíci

      @hidroman1993 What a stupid reply, guide him at least if you know better...

  • @mibaatwork
    @mibaatwork Před 8 měsíci +11

    It is intolerable how badly you prepared for the video. You can't teach people like that.

    • @unconv
      @unconv  Před 8 měsíci +8

      This isn't Unconventional Teaching

    • @alqods80
      @alqods80 Před 8 měsíci +10

      It is a more natural way as a developer, it is much better that way, learnt debugging

    • @noahgottesla3439
      @noahgottesla3439 Před 8 měsíci +12

      This is definitely the practical way to watch and learn. I like your style. You are showing the humanity of future coding

    • @itheenigma
      @itheenigma Před 8 měsíci +5

      I love this approach - similar to how good developers actually code. Keep it up unconv

    • @JT-Works
      @JT-Works Před 8 měsíci +2

      Meh, he is teaching how to troubleshoot. If you want direct directions just read the API documentation.

  • @foxdog9332
    @foxdog9332 Před 7 měsíci

    where do you put the openai key? I can't find anywhere to put it tried searching. Getting a billing not active error.

    • @unconv
      @unconv  Před 7 měsíci

      It grabs it from the OPENAI_API_KEY environment variable. You can set it on Linux by running "export OPENAI_API_KEY=YOUR_API_KEY" and if you're on Windows, I believe you can use "setx" or "set" instead of "export"