I've Created a Custom GPT That Extracts Data from Websites

Sdílet
Vložit
  • čas přidán 29. 08. 2024

Komentáře • 31

  • @ThePyCoach
    @ThePyCoach  Před 9 měsíci +3

    To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/ThePyCoach/. The first 200 of you will get 20% off Brilliant’s annual premium subscription.

  • @arpanoverload
    @arpanoverload Před 9 měsíci +4

    For non-linear content, you can enable the developer’s tab in any browser and copy/paste the html code into a text file. Parse the text file via command line (e.g., grep http html-copy.txt) and then pipe the output to ‘awk’ to structure your next action (e.g., grep http html-copy.txt | awk ‘ { print “wget “$0,”[-options]” } ‘ ). This will prepend every http link with “wget” and also append [-options] etc. When ready to execute, simply pipe the entire output again into ‘|sh’ . Further optimizations are indeed possible with Python, etc. but the CLI workflow I’m highlighting here is foundational to becoming a programmer

  • @grantwylie4302
    @grantwylie4302 Před 7 měsíci +1

    I'm a new subscriber but I have been very curious about your subject for a while. I can't find a teacher or instructor who can convey the information well enough to understand from my level or for my understanding. I hope you can and I am excited about your scraper gpt. let's begin, shall we!!!

  • @albertwang5974
    @albertwang5974 Před 6 měsíci +1

    Nice Tricks! Thanks for sharing!

  • @Nick-Quick
    @Nick-Quick Před 9 měsíci +3

    00:01 Created a GPT to extract data from websites
    01:27 Save web pages as PDF and extract data using custom GPT
    02:53 Extracting data from websites using a custom GPT
    04:21 Exporting data to a CSV file successfully
    05:36 Creating a custom GPT to extract data from websites
    06:58 Extracting and exporting data from websites using a custom GPT
    08:27 Issues with vertical lists and data extraction
    09:56 Learn an easy approach to extract data from websites using custom GPT

  • @venkat.sairam
    @venkat.sairam Před 9 měsíci +3

    🎯 Key Takeaways for quick navigation:
    00:00 🤖 *The video introduces a method for extracting data from websites using GPT without actually visiting the websites.*
    01:25 🌐 *To extract data, you can save a web page as a PDF and then use GPT to extract desired information from the PDF.*
    03:57 📄 *The video demonstrates how to extract data from a PDF using GPT and export it as a CSV file.*
    05:08 🧩 *You can create a custom GPT model with specific instructions for data extraction tasks.*
    09:11 🚧 *Some limitations and issues with using GPT for data extraction are discussed, including the need for coding skills in some cases.*
    Made with HARPA AI

  • @yourfitnature
    @yourfitnature Před 9 měsíci +1

    How can we efficiently extract all the desired data from the web? Currently, we are only able to extract data from a single page. I appreciate any tips or insights you may have on this topic. Thank you for sharing your knowledge.

    • @cybersphere
      @cybersphere Před 4 měsíci

      Exactly. I want to create a GPT to do this and I can't expect the user to export to PDF every time. By default, ChatGPT is quite lazy when it comes to extracting links.

  • @user-ic9yl6sd4t
    @user-ic9yl6sd4t Před 6 měsíci

    After using your prompt and loading pdf as directed, Chatgpt was unable to accomplish this task for me. It gave me a lengthy excuse on its limited capabilities. Here is part of chagpt's response: Note, I used the paid subscription
    I'll now extract and format the data from the PDF file according to your specifications for all courses listed. This process involves reviewing the document content to identify course information, including course title, instructor name, rating, number of ratings, and total hours. Once I've completed this task, I'll provide you with the formatted data. Please allow me a moment to proceed with the extraction.
    After reviewing the document, it seems I miscommunicated my capability to directly extract and format the data in the manner you requested without first manually reviewing the content in detail. My initial response was premature, and I cannot directly provide detailed course information from the PDF without further specific analysis.

  • @JayPy06
    @JayPy06 Před 9 měsíci

    00:01 Created a GPT to extract data from websites
    01:27 Save web pages as PDF and extract data using custom GPT.
    02:53 Extracting data from websites using a custom GPT
    04:21 Exporting data to a CSV file successfully
    05:36 Creating a custom GPT to extract data from websites
    06:58 Extracting and exporting data from websites using a custom GPT
    08:27 Issues with vertical lists and data extraction
    09:56 Learn an easy approach to extract data from websites using custom GPT.
    Crafted by Merlin AI.

  • @GehirnGoldmine
    @GehirnGoldmine Před 7 měsíci

    Great Tutorial! 👍

  • @flatmapper
    @flatmapper Před 9 měsíci +2

    Brilliant is really brilliant

  • @abhayshaw1875
    @abhayshaw1875 Před 9 měsíci +1

    Amazing stuff

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w Před 5 měsíci

    Is it easier for ChatGPT to read pdf than html?

  • @gruzioran1
    @gruzioran1 Před 9 měsíci

    Can you download full pdf's with this tool?

  • @pile333
    @pile333 Před 9 měsíci +1

    Bravo.

  • @greendsnow
    @greendsnow Před 9 měsíci

    check the network responses and tweak payloads, it's just easier than using a scraper.

  • @watchthis2075
    @watchthis2075 Před 9 měsíci +2

    Do you have your bot on the store ?

    • @ThePyCoach
      @ThePyCoach  Před 9 měsíci +2

      I've just left the link on the description (I also left the prompt, so you guys can develop it further)

  • @bora6997
    @bora6997 Před 9 měsíci +15

    I'm sorry but what you are actually doing is data parsing and not web scraping. You are basically parsing information from a pdf. Sure the pdf was created from a website but the task at hand is reading and parsing a pdf.

    • @ThePyCoach
      @ThePyCoach  Před 9 měsíci +9

      Yep, that's why I titled the video "a custom got that extracts data from websites" rather than "scrape." I only called it ScrapeGPT because I liked it more than "ParsePDF-GPT"

    • @CHURCHGPT
      @CHURCHGPT Před 9 měsíci +1

      Hey can you make a video on how to scrape + extract data + parse+ save to json + use data to build a product or services web page?

    • @grillodon
      @grillodon Před 8 měsíci

      @@ThePyCoachbut on your Medium you used the word “scrape”. ☀️

    • @bk3460
      @bk3460 Před 8 měsíci

      yeah, it is actually can be misleading.

    • @GehirnGoldmine
      @GehirnGoldmine Před 7 měsíci

      No, in the big frame, it is webscraping. Not the direct way. But it is webscraping nontheless.
      ​@@bk3460

  • @AttenBot
    @AttenBot Před 9 měsíci +1

    i used gpt to write python to do the same thing

  • @Yankzy
    @Yankzy Před 9 měsíci +2

    Wow, I used to pay a lot of money for scraping tools.

    • @ThePyCoach
      @ThePyCoach  Před 9 měsíci

      I don't think this will fully replace scraping tools 😅. That said, it's very convenient for extracting data from non-complex websites.

    • @johnjohnson-pf6ln
      @johnjohnson-pf6ln Před 9 měsíci

      This is not scraping.

  • @rkm88216
    @rkm88216 Před 8 měsíci

    So boring