Answer Questions about PDF files using ChatGPT API (NO EMBEDDINGS)

Sdílet
Vložit
  • čas přidán 9. 07. 2024
  • Today I create a simple Python program that allows you to answer questions about your own PDF files with the ChatGPT API, without using OpenAI embeddings.
    GitHub: github.com/unconv/gpt-pdfreader
    Donate: buymeacoffee.com/unconv
    Consultations: www.buymeacoffee.com/unconv/e...
    Memberships: www.buymeacoffee.com/unconv/m...
    00:00 Intro
    01:14 Converting PDF's to text
    07:40 Looking up text by keywords
    22:23 Integrating ChatGPT
    39:35 Making it conversational
    44:43 Outro
    Where I learned about TF-IDF: • Search Engine in Rust ...
  • Věda a technologie

Komentáře • 30

  • @redmimic5532
    @redmimic5532 Před 7 měsíci +1

    You're a good man and people should tell you that everyday

  • @aidrivendesigners
    @aidrivendesigners Před rokem

    Most legit AI Code teacher on CZcams. 100% transparency

  • @qi-uo4rp
    @qi-uo4rp Před 3 měsíci

    ❤The video is totally what i want

  • @StevenLoitz
    @StevenLoitz Před 11 měsíci

    This is a great idea. I can imagine implementing something like this for Obsidian or any other note-taking app. It would be super useful for recalling information.

    • @unconv
      @unconv  Před 11 měsíci +1

      Now that's a good idea!

  • @asherf74
    @asherf74 Před rokem +2

    you can initialize the df dictionary with defaultdict(int) instead of generic {}, this will eliminate the need to initialize non existing key with 0

  • @VincenzoCappelluti
    @VincenzoCappelluti Před 3 měsíci

    I have my personal api_kay but I don't understand where I have to put it in the code.
    PS. Thank you for sharing your knowledge :)

  • @dawn_of_Artificial_Intellect
    @dawn_of_Artificial_Intellect Před 11 měsíci +1

    You know that probably you could have added like chain of thought process to allow the LLM to look at the answer it gives to ensure that it is giving the right answer. Just a thought 😊

  • @ghazouaniahmed766
    @ghazouaniahmed766 Před 10 měsíci

    You are incredibly intelligent.
    the only problem and it is not comping from you is the rate limit when you try with a bigger PDF file, it would be great if you handle like 15m PDF size and make it answer us

    • @unconv
      @unconv  Před 10 měsíci +1

      The size of The PDF shouldn't matter. What error are you getting?

    • @ghazouaniahmed766
      @ghazouaniahmed766 Před 10 měsíci

      Am getting rate limit reached

  • @unclecode
    @unclecode Před rokem +1

    Very interesting! I didn’t know that you can use function calls without defining the actual function, and GPT will handle that. However, my question is: why use function calls in these situations when you can directly utilize the completion API by generating a prompt that retrieves a text and returns the keywords? Thanks for the video.

    • @unconv
      @unconv  Před rokem

      Because the answer might be "Sure! I can provide you with keywords: 1. author, 2. millionaire, 3. when" or "author [newline] millionaire [newline] when" or "You could search for 'author millionaire when'" instead of "['author', 'millionaire', 'when']". When using function calling, the response will be convertable to an actual Python list.

  • @dominiccimino8020
    @dominiccimino8020 Před 7 měsíci

    Great tutorial man, thank you! I am trying to do something similar and maybe you know how, but cant seem to figure it out. Im looking to create a web app that will allow users to upload a really long transcript as context and then have chatgpt api generate a description along with a relevant title and a couple more related pieces of text after the user clicks submit. any ideas?

    • @unconv
      @unconv  Před 7 měsíci

      You might want to take a look at my "titleroo" project on GitHub ;)

    • @dominiccimino8020
      @dominiccimino8020 Před 7 měsíci +1

      Great suggestion, thanks!@@unconv

  • @mostafamoustaghni6980

    Good idea but it seems like ChatGPT's way of finding keyboards is very weak at this moment, i would like to see the embedding version side by side to see how many times the keywords method vs embedding method answers correctly per run.

    • @unconv
      @unconv  Před rokem

      To be fair, the text I was searching for was '1997 Born 6 weeks premature [...] my mother refers to me affectionately as “tuna fish.”' It does find it when i say "what did tim's mom call him as a kid?" (sometimes)
      If I use "refer" or "affectionate" in the prompt then it finds the answer every time.
      I will try this with the embeddings at some point.

  • @ronniegulua1868
    @ronniegulua1868 Před 11 měsíci

    Hi mate, any chance you could do the same tutorial in php? 😊

    • @unconv
      @unconv  Před 11 měsíci +1

      Sure, good idea

    • @unconv
      @unconv  Před 11 měsíci +1

      Just posted the video doing it with PHP 🙂

    • @futuretechlab
      @futuretechlab Před 11 měsíci

      @@unconv You are the legend!

  • @aliin_daglicht
    @aliin_daglicht Před rokem

    U didn’t use gpt-3.5turbo-0613 that’s why function calling was lacking sometimes. U used the base gpt-3.5turbo

    • @unconv
      @unconv  Před rokem

      Oh, I didn't realize lol. Before it gave an error though, so I wonder if they've added function calling to the base model already 🤔

    • @unconv
      @unconv  Před rokem

      Apparently they've updated the base models on June 27th so you don't need to use the 0613 model anymore

    • @aliin_daglicht
      @aliin_daglicht Před rokem

      @@unconv Oh ok, because i encountered this kind of error and when i changed to 0613 the function calling worked perfectly. btw great content , thanks a lot this is absolutely cool , i wont if i can extrapolate that tf--idf algo for a python codebase. this method offers more context to the llm to come up with a response than an embedding