Answer Questions about PDF files using ChatGPT API (NO EMBEDDINGS)
Vložit
- čas přidán 9. 07. 2024
- Today I create a simple Python program that allows you to answer questions about your own PDF files with the ChatGPT API, without using OpenAI embeddings.
GitHub: github.com/unconv/gpt-pdfreader
Donate: buymeacoffee.com/unconv
Consultations: www.buymeacoffee.com/unconv/e...
Memberships: www.buymeacoffee.com/unconv/m...
00:00 Intro
01:14 Converting PDF's to text
07:40 Looking up text by keywords
22:23 Integrating ChatGPT
39:35 Making it conversational
44:43 Outro
Where I learned about TF-IDF: • Search Engine in Rust ... - Věda a technologie
You're a good man and people should tell you that everyday
Most legit AI Code teacher on CZcams. 100% transparency
Thank you 🙏
❤The video is totally what i want
This is a great idea. I can imagine implementing something like this for Obsidian or any other note-taking app. It would be super useful for recalling information.
Now that's a good idea!
you can initialize the df dictionary with defaultdict(int) instead of generic {}, this will eliminate the need to initialize non existing key with 0
Thanks!
I have my personal api_kay but I don't understand where I have to put it in the code.
PS. Thank you for sharing your knowledge :)
You know that probably you could have added like chain of thought process to allow the LLM to look at the answer it gives to ensure that it is giving the right answer. Just a thought 😊
Good idea
You are incredibly intelligent.
the only problem and it is not comping from you is the rate limit when you try with a bigger PDF file, it would be great if you handle like 15m PDF size and make it answer us
The size of The PDF shouldn't matter. What error are you getting?
Am getting rate limit reached
Very interesting! I didn’t know that you can use function calls without defining the actual function, and GPT will handle that. However, my question is: why use function calls in these situations when you can directly utilize the completion API by generating a prompt that retrieves a text and returns the keywords? Thanks for the video.
Because the answer might be "Sure! I can provide you with keywords: 1. author, 2. millionaire, 3. when" or "author [newline] millionaire [newline] when" or "You could search for 'author millionaire when'" instead of "['author', 'millionaire', 'when']". When using function calling, the response will be convertable to an actual Python list.
Great tutorial man, thank you! I am trying to do something similar and maybe you know how, but cant seem to figure it out. Im looking to create a web app that will allow users to upload a really long transcript as context and then have chatgpt api generate a description along with a relevant title and a couple more related pieces of text after the user clicks submit. any ideas?
You might want to take a look at my "titleroo" project on GitHub ;)
Great suggestion, thanks!@@unconv
Good idea but it seems like ChatGPT's way of finding keyboards is very weak at this moment, i would like to see the embedding version side by side to see how many times the keywords method vs embedding method answers correctly per run.
To be fair, the text I was searching for was '1997 Born 6 weeks premature [...] my mother refers to me affectionately as “tuna fish.”' It does find it when i say "what did tim's mom call him as a kid?" (sometimes)
If I use "refer" or "affectionate" in the prompt then it finds the answer every time.
I will try this with the embeddings at some point.
Hi mate, any chance you could do the same tutorial in php? 😊
Sure, good idea
Just posted the video doing it with PHP 🙂
@@unconv You are the legend!
U didn’t use gpt-3.5turbo-0613 that’s why function calling was lacking sometimes. U used the base gpt-3.5turbo
Oh, I didn't realize lol. Before it gave an error though, so I wonder if they've added function calling to the base model already 🤔
Apparently they've updated the base models on June 27th so you don't need to use the 0613 model anymore
@@unconv Oh ok, because i encountered this kind of error and when i changed to 0613 the function calling worked perfectly. btw great content , thanks a lot this is absolutely cool , i wont if i can extrapolate that tf--idf algo for a python codebase. this method offers more context to the llm to come up with a response than an embedding