How to chat with your PDFs using local Large Language Models [Ollama RAG]

The How-To Guy

zhlédnutí 64 708

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 3. 07. 2024
In this tutorial, we'll explore how to create a local RAG (Retrieval Augmented Generation) pipeline that processes and allows you to chat with your PDF file(s) using Ollama and LangChain!
✅ We'll start by loading a PDF file using the "UnstructuredPDFLoader"
✅ Then, we'll split the loaded PDF data into chunks using the "RecursiveCharacterTextSplitter"
✅ Create embeddings of the chunks using "OllamaEmbeddings"
✅ We'll then use the "from_documents" method of "Chroma" to create a new vector database, passing in the updated chunks and Ollama embeddings
✅ Finally, we'll answer questions based on the new PDF document using the "chain.invoke" method and provide a question as input
The model will retrieve relevant context from the updated vector database, generate an answer based on the context and question, and return the parsed output.
TIMESTAMPS:
============
0:00 - Introduction
0:07 - Why you need to use local RAG
0:52 - Local PDF RAG pipeline flowchart
5:49 - Ingesting PDF file for RAG pipeline
8:46 - Creating vector embeddings from PDF and store in ChromaDB
14:07 - Chatting with PDF using Ollama RAG
20:03 - Summary of the RAG project
22:33 - Conclusion and outro
LINKS:
=====
🔗 GitHub repo: github.com/tonykipkemboi/olla...
Follow me on socials:
𝕏 → / tonykipkemboi
LinkedIn → / tonykipkemboi
#ollama #langchain #vectordatabase #pdf #nlp #machinelearning #ai #llm #RAG
Věda a technologie

Komentáře • 318

@levi4328 Před 2 měsíci ⁺⁵⁸
Im a medical researcher and, surprisingly, my life is all about pdfs i dont have any time to read; let alone learn the basics of code. And i think there's a lot of people on the same boat as mine. Unfortunately, its very fucking hard to actually find an ai tool thats barely reliable. Most of youtube is damped with sponsors for ai magnates trying to sell their rebranded and redudant worthless ai-thingy for a montlhy subscription or an unjustifiably costly api that follows the same premise. The fact that you, the only one that came closer to what i actually need - and a very legitimate need - is a channel with
@tonykipkemboi Před 2 měsíci ⁺¹⁴
Thank you so much for sharing about the pain points you're experiencing and the solution you're seeking. I'd like to be more helpful to you and many more like you as well. I have an idea of creating a UI using Streamlit for the code in this tutorial with a step-by-step explanation of how to get it running on your system. You will essentially clone the repository, install Ollama and pull any models you like, install the dependencies, then run Streamlit. You'll then be able to upload PDFs on the Streamlit app and chat with it on a chatbot like interface. Let me know if this will be helpful. Thanks again for your feedback.
@ilyassemssaad9012 Před 2 měsíci ⁺¹
hey, hmu and ill give you my rag that supports multiple pdfs and you can choose the llm you desire to use.
@Aberger789 Před 2 měsíci ⁺²
I'm in the space as well, and am trying to find the best way to parse PDFs. I've setup grobid on docker and tried that out. My work laptop is a bit garbage, and being in the world's largest bureaucracy, procuring hardware is a pain in the ass. Anyways, great video.
@kumarmanchoju1129 Před 2 měsíci ⁺²
USe nvidia RTX chat for pdf summarizing and querying. Purchase a cheap RTX card of minimum 8GB vRAM.
@InnocentiusLacrimosa Před 2 měsíci ⁺¹
@@tonykipkemboiI think most people are in pain now with just this part "upload pdfs to service X". This is what they want/have to avoid. Anyhow, nice video you made here.
@davidtindell950 Před 6 dny ⁺²
Thank You. I have done several similar projects and I learn something new about 'local RAG' with each one !
@deldridg Před 2 měsíci
Thank you for this excellent intro. You are a natural teacher of complex knowledge and this has certainly fast-tracked my understanding. I'm sure you will go far and now you have a new subscriber in Australia. Cheers and thank you - David
@tonykipkemboi Před 2 měsíci
Glad to hear you found the content useful and thank you 🙏 😊
@thiagobutignonclaramunt410 Před měsícem ⁺²
You are a awesome teacher, thank you so much to explain this in a clean and objective way :)
@tonykipkemboi Před měsícem
🙏
@johnlunsford5868 Před 2 měsíci ⁺²
Top-tier information here. Thank you!
@tonykipkemboi Před 2 měsíci
🙏
@n0madc0re Před 2 měsíci ⁺²
this was super clear, extremely informative, and was spot on with the exact answers I was looking for. Thank you so much.
@tonykipkemboi Před 2 měsíci
Glad you found it useful and thank you for the feedback!
@claussa Před 2 měsíci ⁺⁷
Welcome on my special list of channels I subscribe to. Looking forward to you making me smarter😊
@tonykipkemboi Před 2 měsíci ⁺¹
Thank you for that honor! I'm glad to be on your list and will do my best to deliver more awesome content! 🙏
@HR31.1.1 Před 2 měsíci ⁺²
Dope video man! Keep them coming
@tonykipkemboi Před 2 měsíci ⁺¹
Appreciate it!!
@gptOdyssey Před 2 měsíci ⁺³
Clear instruction, excellent tutorial. Thank you Tony!
@tonykipkemboi Před 2 měsíci
Thank you for the feedback and glad you liked it! 😊
@Oiseaux_rebelle Před měsícem
You're welcome Ezekiel!
@ISK_VAGR Před 2 měsíci ⁺⁴
Congrats man. Really useful content. Well explained and effective.
@tonykipkemboi Před 2 měsíci ⁺¹
Thank you, @ISK_VAGR! 🙌
@chrisogonas Před 2 měsíci ⁺¹
Simple and well illustrated, Arap Kemboi 👍🏾👍🏾👍🏾
@tonykipkemboi Před 2 měsíci ⁺¹
Asante sana bro! 🙏
@aloveofsurf Před 2 měsíci ⁺¹
This is a fun and potent project. This provides access to a powerful space. Peace be on you.
@tonykipkemboi Před 2 měsíci
Thank you and glad you like it!
@Reddington27 Před 2 měsíci ⁺⁵
Thats a pretty clean explanation.
looking for more videos.
@tonykipkemboi Před 2 měsíci ⁺¹
Thank you! Glad you like the delivery. I got some more cooking 🧑‍🍳
@Mind6 Před 2 měsíci ⁺¹
Very helpful! Great video! 👍
@tonykipkemboi Před 2 měsíci
🙏❤️
@DaveJ6515 Před 2 měsíci ⁺¹
Very good! Easy to understand, easy to try, expandable ....
@tonykipkemboi Před 2 měsíci
Awesome! Great to hear.
@DaveJ6515 Před 2 měsíci ⁺¹
@@tonykipkemboi you deserve it. Too many LLM CZcamsrs are more concerned to show a lot of things than to make them easy to understand and to reproduce. Keep up the great work!
@SimpleInformationINC Před měsícem
Nice job, thanks Tony!
@tonykipkemboi Před měsícem
🙏
@teddyperera8531 Před měsícem ⁺¹
This is a great tutorial. Thank you
@tonykipkemboi Před měsícem
🙏
@DynamicMolecules Před měsícem ⁺¹
Thanks for this amazing tutorial on building a local LLM. I applied it to my research paper PDFs, and the results are impressive.
@tonykipkemboi Před měsícem
Awesome 🤩 Love to hear that! Did you experiment without using the MultiQueryRetriever in the tutorial to see the difference?
@DynamicMolecules Před měsícem ⁺¹
@@tonykipkemboi That's an interesting question. I tried and found that MultiQueryRetriever works well in general, when LLM needs to connect indirect information from document, but fails to provide relevant information for direct information present in the document. But, this observation could differ case to case.
@grizzle2015 Před 2 měsíci ⁺¹
thanks man this is extremely helpful!
@tonykipkemboi Před 2 měsíci
🙏🫡
@nagireddygajjela5430 Před 18 dny ⁺¹
Thank you for sharing good content
@tonykipkemboi Před 18 dny
🙏
@notoriousmoy Před 2 měsíci ⁺³
Great job
@tonykipkemboi Před 2 měsíci
Thank you! 🙏
@essiebx Před měsícem ⁺¹
thanks for this tony
@tonykipkemboi Před měsícem
🙏
@ThinAirElon Před 2 měsíci ⁺²
Super!
@tonykipkemboi Před 2 měsíci ⁺¹
🙏
@vineethnj8744 Před měsícem ⁺¹
Good one, Good luck🤞
@tonykipkemboi Před měsícem
Thanks ✌️
@Marques2025 Před měsícem ⁺⁵
Useful tip : use a proper wifi dont use Mobile hotspot while pulling the model from ollama ,i had a error with that ,hopes it helps someone😊
@attaboyabhi Před 25 dny ⁺¹
nicely done
@tonykipkemboi Před 23 dny
Thank you 😊
@franciscoj.moyaortiz7025 Před 2 měsíci ⁺¹
awesome content! new sub
@tonykipkemboi Před 2 měsíci
Thank you! 🙏
@VairalKE Před 2 měsíci ⁺⁶
Good to see fellow Kenyans on AI. Perhaps the Ollama WebUI approach would be easier for beginners as one can attach a document, even several documents to the prompt and chat.
@tonykipkemboi Před 2 měsíci ⁺²
🙏 Yes, actually working on a Streamlit UI for this
@iceiceisaac Před 2 měsíci ⁺¹
so cool!
@tonykipkemboi Před 2 měsíci
Thank you 🙏
@Marduk477 Před 2 měsíci ⁺³
Really userful content and well explained. t would be interesting to see a video but with different types of files, like only PDFs, for example Markdown, PDF, and CSV all at once. It would be very interesting.
@tonykipkemboi Před 2 měsíci
Thank you! I have this in my content pipeline.
@ninadbaruah1304 Před 2 měsíci ⁺¹
Good video 👍👍👍
@tonykipkemboi Před 2 měsíci
@georgerobbins5560 Před 2 měsíci ⁺¹
Nice
@tonykipkemboi Před 2 měsíci
Thank you!
@Joy_jester Před 2 měsíci ⁺⁵
Can you make one video of RAG using Agents? Great video btw. Thanks
@tonykipkemboi Před 2 měsíci ⁺³
Sure thing. I actually have this in my list of upcoming videos. Agentic RAG is pretty cool right now and will play with it and share a video tutorial. Thanks again for your feedback.
@metaphyzxx Před 2 měsíci
I was planning on doing this as a project. If you beat me to it, I can compare notes
@angadbandal3844 Před měsícem ⁺¹
very detailed explanation, thanks, can you please make the same project to give responses in multi-language and with voice output?
@tonykipkemboi Před měsícem
Thank you. Yes that would be cool. I can see the challenge coming from finding an open source model that is good at multiple languages. The ones I used are not great at all. For voice, it'd probably be easy to use an open source TTS or even be more granular and use 11labs for a better quality in spite of it not being local.
@carolinefrasca Před 2 měsíci ⁺¹
🤩🤩
@rmperine Před 2 měsíci ⁺¹
Great delivery of material. How about fine-tuning for llama3 using your own curated dataset as a video? There are some out there, but your teaching style is very good.
@tonykipkemboi Před 2 měsíci
Thank you and that's a great suggestion!
I'll add that to my list.
@kainew Před 8 dny ⁺¹
Your video is excellent, you gained a subscriber!
I'm looking to move all of my more than 500 project documentation files into a GPT to help resolve support issues and answer questions from auxiliary teams, I can see this being exactly what I needed.
Do you know someone who is trying to approach project documentation with LLMs templates?
Thank you, big hug from Brazil!
@tonykipkemboi Před 8 dny
So glad you found it helpful and thank you for subscribing as well! 💜
Can expand more on the "documentation with LLMs templates"?
@stanTrX Před 2 měsíci ⁺²
Thanks, Can you please explain one by one and slowly. Especially the RAG part
@tonykipkemboi Před 2 měsíci ⁺¹
Thanks for asking. Which part of the RAG pipeline?
@Srb0002 Před měsícem
Good Explanation. could you please make video, If PDFs has images and tables in it. How would we extract , Store and RAG on images, tables and text using open source models
@tonykipkemboi Před měsícem
This is a good topic to explore. I might just create another video diving deeper into pdf types and how to extract and use multimodal elements.
@Nyx-bm5be Před 11 dny ⁺¹
Wonderful tutorial, man! Let me ask you, what are the other kinds of prompts we can use? Also, is it normal for the rag to answer questions about things not on the pdf that was loaded? For example, i tested with the prompt "what is a dog" and got a answer back. Is it because of the RAG and Ollama? Thanks a bunch
@rockefeller7853 Před 2 měsíci ⁺⁵
Thanks for the share. Quite enlightening. I will def build upon that. Here is the problem I have. Let's say Ihave two documents and I wanna chat with both at the same time (for instance to extract conflicting points between the two). What would you advise here?
@tonykipkemboi Před 2 měsíci ⁺⁴
Thank you! That's an interesting use case for sure. My instinct before looking up some solutions is to maybe create 2 separate collections for each of the files then retrieve them separaetly and chat with them for comparison. I'm sure my suggestion above might not be efficient at all. I will do some digging and share any info I find.
@thealwayssmileguy9060 Před 2 měsíci ⁺²
Would love it if can make the streamlit app! I am still struggeling to make a streamlit app based on open source llms
@tonykipkemboi Před 2 měsíci
Thank you! Yes, I'm working on a Streamlit RAG app.
I have released a video on Ollama + Streamlit UI that you can start with in the meantime.
@thealwayssmileguy9060 Před 2 měsíci
@@tonykipkemboi thanks bro! I will defo watch👌
@fredrickdenga7552 Před měsícem ⁺¹
from the Kenyan homeland
@tonykipkemboi Před měsícem ⁺¹
Kabisa bro 😎
@fredrickdenga7552 Před měsícem ⁺¹
@@tonykipkemboi am, vouching for u bro, kitu yoyote mpya about Ai, LLMs, etc ikitokea, we weka hapa asap, we're fully behind you
@Ollerismo Před měsícem ⁺¹
great walkthrough, the audio can be increased a little bit...
@tonykipkemboi Před měsícem
Thank you! 😊 I noticed that I didn't adjust my gain after I had posted. Thanks for your feedback.
@rayzorr Před měsícem ⁺¹
Good stuff. Shame you didn't run the notebook. Would like to see how it works.
@tonykipkemboi Před měsícem
Thank you. I tried recording and running the notebook but it killed my video recording since they were competing for system resources with Ollama. I ran the notebook as you can see the outputs in it already and just walked through the code. I'll try running it on the next video for more interactivity.
@dhilipsurya2222 Před měsícem ⁺¹
You are a legend 🫡
Thank you !!!
@tonykipkemboi Před měsícem
❤️🫡
@wah866sky7 Před měsícem ⁺¹
Thanks a lot! If we have a mix of multiple PDFs, Words or Excel files, how can we change the RAG to support retrieval of them?
@tonykipkemboi Před měsícem
Glad you found it helpful. For different file types, you would consider the loading/parsing and chunking strategies that fit those data types. I'm working on the next video which I will go over CSV & Excel RAG.
@scrollsofvipin Před 2 měsíci ⁺³
What GPU do you use ? I have Ollama running on an i5 intel with integrated CPU and so unable to use any of 3B + models. TinyLama and TinyDolphin works but the accuracy is way off
@tonykipkemboi Před 2 měsíci ⁺²
I have an Apple M2 with 16GB of memory. I noticed that larger models slow down my system and sometimes force a shutdown of everything. One way around it is deleting other models you're not using.
@qzwwzt Před měsícem ⁺¹
Congrats on your Video! In your example you use just one PDF, I have a demand to work with thousands of documents, and the main issue is the time consumption to upload the videos. Can you give me some advice?
@tonykipkemboi Před měsícem
Did you mean to say it takes time to upload the documents to vector store and query over them? If yes, I do agree with you that latency is an issue especially since we're adding another layer of retrieval using the MultiQueryRetriever. It would also depend on your system as well if you're using Ollama.
@garthcase1829 Před 2 měsíci ⁺²
Great job. Does the file you chat with have to be a PDF or can it be a CSV or other structured file type?
@tonykipkemboi Před 2 měsíci ⁺¹
🙏 thank you. I'm actually working on a video for RAG over CSV. The demo in this tutorial will not work for CSV or structured data; we need a better loader for structured data.
@theDaddyBouldering Před 19 dny ⁺¹
thanks for the tutorial ! how can I make the model to give answers in a different language?
@tonykipkemboi Před 16 dny
It would largely depend on the capabilities of the given model to translate from English to the target language. You can try by adding the target language in the prompt. Tell it to return the results in X language.
@enochfoss8993 Před 2 měsíci ⁺¹
Great video! Thanks for sharing. I ran into an issue with a Chroma dependency on SQLite3 (i.e. RuntimeError: Your system has an unsupported version of sqlite3. Chroma requires sqlite3 >= 3.35.0). The suggested solutions are not working. Is it possible to use another DB in place of Chroma?
@tonykipkemboi Před 2 měsíci
Thank you! Yes, you can swap it with any other open-source vector database. You might also try using a more recent version of Python, which should come with a newer version of SQLite. Do you know what version you are using now?
You can also try installing the binary version in the notebook like so: `!pip install pysqlite3-binary`
@aaaguado Před měsícem ⁺¹
Hello friend, thank you very much for your content. I have a question, how can I make it listen to my server within Google Collab so I don't have to use Jupyter, since my resources are a bit limited?
@SiddharthMishra-pg1os Před 29 dny ⁺¹
Hello ! nice tutorial. I was stuck on the first part unfortunately as I get the error:
"Unable to get page count Is poppler installed and in PATH".
Do you have any idea how to solve this ?
I have already installed poppler using brew.
@tonykipkemboi Před 26 dny
Thank you. Have you tried using chatgpt to troubleshoot?
@AnkitSingh-xc8em Před 22 dny ⁺¹
Appreciate your work, wanted to know can i use it for confidential pdf. is there will be any chances of data leak ??
@tonykipkemboi Před 22 dny
Thank you for the kind words. Yes, if you use Ollama models like we did on the video, then your content will stay private and not be sent to any online service. To be sure, I'd recommend turning off your WiFi or any connection once you've loaded all the dependencies and imports. You can then run the cells to lead your PDF to a vector db and chat with it. After you're done, you can delete the collection where you saved the vectors of your PDF before turning your connection back on. This is an extra measure to give you peace of mind.
@kiranshashiny Před 13 dny ⁺¹
Nice video, and very informative.
My question: I have downloaded the LLMs like gemma, llama2, llama3 and so on on my MacOS. But due to some technical issue, I deleted these LLMs. ( e.g: $ ollama rm llama2)
Now I want them again, and noticed that if I run "$ ollama run llama3", this **downloads the entire 4.7GB from the internet** over again.
Is it possible to keep them downloaded at some place and when I want it - just run $ ollama run and use it and later delete it when not needed ?
Again Thanks in advance and would appreciate a response.
@tonykipkemboi Před 13 dny
Thank you. What you did earlier is the standard way of downloading, serving, and deleting the Ollama models.
You can also download more quantized options for each, with less memory. I usually add and then delete whenever I don't need it or when I need to download another model.
@madhudson1 Před měsícem ⁺²
you're initial ingestion, it doesn't load the first page, it ingests the entire document. Your data variable consists of a list of a single Document object, that will contain the content of the entire pdf
@tonykipkemboi Před měsícem
That is correct. I did not change the code after testing it previously with loading individual pages. You can load by page and add metadata that way.
@madhudson1 Před měsícem ⁺¹
@@tonykipkemboi but cool tutorial for summarisation using a multi query retriever. I didn't know this was a thing in langchain
@tonykipkemboi Před měsícem
@@madhudson1 thank you. Yes, it's a neat function
@Stellasogks Před měsícem ⁺¹
Are the libraries you used (langchain , chromaDB ...) open source? and can we use any ollama model?
@tonykipkemboi Před měsícem
yes and yes
@farexBaby-ur8ns Před 2 měsíci ⁺¹
Good one.. ok you touched on security- you have here something that doesn’t let things flow out to the internet. I saw a bunch of vids about tapping data from dbs using sql agents. But none said specifically anything about security. So qn- does using sql agents violate data security?
@tonykipkemboi Před 2 měsíci
You bring up a critical point and question. Yes, I believe most agentic workflows currently, especially tutorials, lack proper security and access moderation. This is a growing and evolving portion of agentic frameworks + observability, IMO. I like to think of it as people needing special access to databases at work and someone managing roles and the scope of access. So agents will need some form of that management as well.
@guanjwcn Před 2 měsíci ⁺²
Thanks. Btw, how did you make your your CZcams profile photo? It looks very nice.
@tonykipkemboi Před 2 měsíci ⁺¹
Thank you! 😊
I used some AI avatar generator website that I forgot but I will find it and let you know.
@guanjwcn Před 2 měsíci ⁺¹
Thank you
@yashvarshney8651 Před 2 měsíci ⁺¹
could you drop a tutorial on building rag chatbots with ollama and langchain with custom data and guard-railing?
@tonykipkemboi Před 2 měsíci
That sounds interesting and something I'm looking into as well. For guard-railing, what are your thoughts on the frameworks for this portion? Have you tried any?
@yashvarshney8651 Před 2 měsíci
@@tonykipkemboi realpython.com/build-llm-rag-chatbot-with-langchain/
I've reas this article, and the only guard-railing mech they seem to apply is an additional prompt with every inference.
@TheShreyas10 Před 13 dny ⁺¹
Quite interesting and thanks for sharing it, can you let me know if this would run on 32GB CPU RAM Core i7 processor? Considering you are using mistral model
@tonykipkemboi Před 13 dny ⁺¹
Thank you. Yes that should be sufficient to run the program.
@xrlearn Před 2 měsíci ⁺⁴
Thanks for sharing this. Very helpful. Also, what are you using for screen recording and editing this video ? I see that it records the section where your mouse cursor is ! Nice video work as well. Only suggestion is to increase gain in your audio
@tonykipkemboi Před 2 měsíci ⁺⁴
I'm glad you find it very helpful. I'm using Screen Studio (screen.studio) for recording; it's awesome!
Thank you so much for the feedback as well. I actually reduced it during editing thinking it was too loud haha. I will make sure to readjust next time.
@xrlearn Před 2 měsíci ⁺²
@@tonykipkemboi Btw, can you see those 5 questions that it generated before summarizing the document?
@tonykipkemboi Před 2 měsíci ⁺²
@@xrlearn, I'm sure I can. I will try printing them out and share them here with you tomorrow.
@tonykipkemboi Před 2 měsíci ⁺³
Hi @xrlearn - Found a way to print the 5 questions using `logging`. Here's the code you can use to print out the 5 questions:
```
import logging
logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)
unique_docs = retriever.get_relevant_documents(query=question)
len(unique_docs)
```
Here are more detailed docs from LangChain that will help.
python.langchain.com/docs/modules/data_connection/retrievers/MultiQueryRetriever/
@AfrivisionMediake Před měsícem ⁺¹
hey, thanks for this. Question. Does it have limitations on the number of documents one can upload to chat with? Like can I upload thousands of documents to use?
@tonykipkemboi Před měsícem
I haven't tested it with many documents but will do.
@AfrivisionMediake Před měsícem ⁺¹
Will appreciate a lot. Much love from Kenya btw😃
@tonykipkemboi Před měsícem
@@AfrivisionMediake 🫡
@venuai-sh4fv Před měsícem ⁺¹
Should I make Chroma DB connection to make this work?
@tonykipkemboi Před měsícem
We do use Chroma in the tutorial.
@nabilabiba1639 Před 3 dny
good job ,thanks a lot .I have a question :how i do chatbot ,and with wich technologies :RASA , langchain ,ollama ......
if i do train it with my data (scaping ,pdf ....)
@tonykipkemboi Před 3 dny
I don't think am following your question correctly. Are you asking how to build a chatbot for the RAG in this video?
@aizaz101 Před 24 dny ⁺¹
How can we get output without rephrasing? I mean i want to know what exactly written in PDF as it is. for example if i say what is written in article 3.2.2 and output will be in quotes word to word?
@tonykipkemboi Před 23 dny
Ah yes, good idea. I think for this, you'll have to add citations. I'm early into playing with this as I am working on the Streamlit UI for RAG. Always good to have cited sources.
@deldridg Před 2 měsíci
Is it possible to upload multiple PDF documents using the langchain doc loaders and then converse across them? Excellent tut and thanks - David
@tonykipkemboi Před 2 měsíci
That can definitely be possible. Are you thinking of probably two pdfs that each carry different content?
@deldridg Před 2 měsíci ⁺¹
@@tonykipkemboi Thank you for taking the time to reply - much appreciated.
I was just wondering whether this approach allows the ingesion of multiple documents which could be contrasted or used in conjunction with each other.
Cheers mate - David
@user-tl1ms1bq6n Před 2 měsíci ⁺¹
Pls provide notebook if possible. great video.
@tonykipkemboi Před 2 měsíci ⁺¹
Thank you! Checkout the repo link in the description for all the code.
Here's the link github.com/tonykipkemboi/olla...
@hectorelmagotv8427 Před 2 měsíci ⁺¹
@@tonykipkemboi hey, the link is not working, can provide it again pls?
@hectorelmagotv8427 Před 2 měsíci ⁺¹
no problem, didnt see the description, thanks!
@tonykipkemboi Před 2 měsíci
@@hectorelmagotv8427 , thanks. Just to confirm, did it work?
@pw4645 Před 2 měsíci ⁺¹
Hi, and if there were 6 or 10 PDFs, how would you load them into the RAG? Thanks
@tonykipkemboi Před 2 měsíci
Good question! I would iterate through them while loading them and also index the metadata so it's easy to reference which pdf provided the context for the answer. There's actually several ways of doing this but that would be my simple first try.
@DataScienceandAI-doanngoccuong Před 2 měsíci ⁺⁴
Can this model query with tabular data or image data, can't it?
@tonykipkemboi Před 2 měsíci ⁺²
I assume you're talking about Llama2? Or are you referring to the Nomic text embedding model? If it's Llama2, it's possible to use it to interact with tabular data by passing the data to it (RAG or just pasting data to the prompt) but cannot vouch for its accuracy though. Most LLMs are not great at advanced math but they're getting better for sure.
@ammardarkazanli5633 Před měsícem ⁺¹
Thanks again for the tutorial. I am running the same question against a 500 pages pdf multiple times and I am getting different answer everytime I ran. What can be going wrong here? I simply have a for loop and looping through the exact same question using the same vector db but yet getting different answers.
@tonykipkemboi Před měsícem
Thanks. Are the answers hallucinated in all of them of just the wording is different each time?
@ammardarkazanli5633 Před měsícem ⁺¹
@@tonykipkemboi They are not completely off but they are a bit different. The pdf I am using is about medical terminology. I asked it simply to tell me the components of the cardiovascular system. There is a simple paragraph in it that lists them but yet, one time it talks about the kidneys.. others talks about the heart anatomy... so it is not completely hallucinating but it is not able to nail down a consistent answer
@tonykipkemboi Před měsícem ⁺¹
@@ammardarkazanli5633 one way I can think to solve this is by using the "seed" parameter for the model. You will need to create a modelfile with Ollama model you're using as the LLM so it generates the same output for the prompt. Here's the docs on how to create that
github.com/ollama/ollama/blob/main/docs/modelfile.md. You can also watch my other video on creating Ollama UI with Streamlit to see how I implemented the modelfile although I didn't add seed but it's easy to add.
@ammardarkazanli5633 Před měsícem ⁺¹
I will give this a try…
@bhagavanprasad Před 16 hodinami ⁺¹
@tonykipkemboi, Thank you very much for valuable video. It helped me a lot.
I was struggling to get the right LLM that can run locally.
I have a question: How do I create a persistent RAG so that the query results can be faster.
@tonykipkemboi Před 15 hodinami
@@bhagavanprasad glad you found it useful. For this example, the speed depends on several factors one major one being your system configuration. If you have a GPU, then it will be much faster. An intermediate step would be to remove the MultiQueryRetriever since that generates more questions from your prompt then retrieve context for all the questions from the vectoredb which takes time and introduces latency. You can use the generic one question query then optimise retrieval another way like using a reranking model. But that might also be a bit more than what we covered in this tutorial. There's definitely a trade off where you sacrifice accuracy for speed and vice versa.
@vineethnj8744 Před měsícem ⁺¹
Can we do this with llama3 , which will be more good?
@tonykipkemboi Před měsícem
Yes you can use llama3.
@ariouathanane Před 2 měsíci
Thank you very much for your videos.
Please, what's if we have severals PDFs?
@tonykipkemboi Před 2 měsíci ⁺¹
Yes, so you can iteratively load the pdfs, chunk them by page or something else, then index them in a vector database. You would then ask your query like always and it would find the context throughout all the documents to give you an answer.
@unflexian Před 2 měsíci ⁺¹
Oh I thought you were saying you've embedded an LLM into a PDF document, like those draggable 3d diagrams.
@user-eh2zd2ih8v Před měsícem ⁺³
I did some first experiments with local AI, using Ollama and AnythingLLM to talk to the model about a pdf file... and so far, the results are just completely unusable. The AI is just hallucinating on me constantly, making up sentences in the pdf that are not there, failing simple tasks like "quote the first line on page 2 without changing it", not to mention more complex tasks like "list all tools mentioned on page 3". Maybe I'm doing something wrong, but I feel very discouraged from using AI at all for this kind of usecase.
@tonykipkemboi Před měsícem ⁺¹
Sorry to hear the troubles but this is very common. Have you tried setting the temperature of the model to 0? That way there's no room for it to be creative.
@user-eh2zd2ih8v Před měsícem ⁺¹
@@tonykipkemboi Interesting, I'll look into that thanks!
@tonykipkemboi Před měsícem
@@user-eh2zd2ih8v let me know what comes of it.
@erickcedeno7823 Před 2 měsíci ⁺¹
Nice video. When i try to execute the following commands: !ollama pull nomic-embed-text and !ollama list. I receive the following error: /bin/bash: line 1: ollama: command not found
@tonykipkemboi Před 2 měsíci
This error is means that Ollama is not installed on your system or not found in your system's PATH. Do you have Ollama already installed?
@erickcedeno7823 Před 2 měsíci
@@tonykipkemboi Hello, I've installed ollama in my local system but i don't know why i'm getting an error in google colab
@user-tl4de6pz7u Před 2 měsíci ⁺¹
I encountered several errors when trying to execute the following line in the code:
data = loader.load()
Despite installing multiple modules, such as pdfminer, I'm unable to resolve an error stating "No module named 'unstructured_inference'." Has anyone else experienced similar issues with this code? Any assistance would be greatly appreciated. Thank you!
@tonykipkemboi Před 2 měsíci
Interesting that's asking for that since that's for layout parsing and we didn't use it. Try installing it like so; "!pip install unstructured-inference"
@ayushmishra5861 Před 2 měsíci ⁺²
I've been given a story, the trojan war which is a 6 page pdf or I can even use the story as a text , also 5 pre decided question is given to ask based on the story, I want to evaluate different models answers but I am failing to evaluate even one, kindly help, please guide thoroughly.
@ayushmishra5861 Před 2 měsíci
Can you please reply, would really appreciate that.
@tonykipkemboi Před 2 měsíci ⁺¹
This sounds interesting! I believe if you're doing this locally, you can follow the tutorial to create embeddings of the PDF and store it in a vector db then use the 5 questions to generate output from the models. You can switch the model type in between each response and probbly have to save each response separately so you can compare them afterwards.
@ayushmishra5861 Před 2 měsíci ⁺²
@@tonykipkemboi What amount of storage will the model take.
I don't have greatest of the hardware.
@tonykipkemboi Před 2 měsíci ⁺²
Yes, there are smaller quantized models on Ollama you can use, but most of them require a sizeable amount of RAM. Check out these instructions from Ollama on the size you need for each model. You can also do one at a time, then delete the model after use to create space for the next one you pull. I hope that helps.
github.com/ollama/ollama?tab=readme-ov-file#model-library
@nitinkhanna9754 Před 24 dny ⁺¹
chromadb works with sqllite 3. facing lot of issues using chroma. can we use any other db or just pcl the entire vector db
@tonykipkemboi Před 23 dny
You can definitely replace chroma with any other db like Weaviate or Qdrant or Milvus and so on.
@nitinkhanna9754 Před 23 dny ⁺²
Thanx man ! It worked 👌
@tonykipkemboi Před 23 dny
@@nitinkhanna9754 awesome!
@kiranshashiny Před 15 dny
@@nitinkhanna9754 What other DB did you use to make it work, as suggested by @tony.
@brianclark4639 Před 2 měsíci
I tried the first command %pip install -q unstructured langchain and its taking a super long time to install. Is this normal?
@tonykipkemboi Před 2 měsíci
It shouldn't take more than a couple of seconds but also depending on your system and package manager, it might take a while. Did it resolve?
@saiprasannach2488 Před měsícem ⁺¹
What is the python version you used for running this poc
@tonykipkemboi Před měsícem
Python 3.9
@ammardarkazanli5633 Před 2 měsíci ⁺¹
Can you think of a reason why pip install unstructured[all-docs] is failing on my two macs. I get the error that "Failed to build onnx
ERROR: Could not build wheels for onnx, which is required to install pyproject.toml-based projects".. I have tried almost every suggestion on the internet. I am attempting to run on python 3.12.1 and 3.12.3 ... Thanks
@tonykipkemboi Před měsícem ⁺¹
I had the same issue at some point. Switching to Python 3.9 resolved the error for me. Create a virtual environment with 3.9 and try running it there.
@ammardarkazanli5633 Před měsícem ⁺¹
@@tonykipkemboi Just to confirm, everything worked well with 3.9.19.. Thanks for the suggestion. The video was very helpful to get a handle with all the commotion around the different models.
@tonykipkemboi Před měsícem
@@ammardarkazanli5633 glad to hear it worked!
@ruidinis75 Před 2 měsíci ⁺¹
E do not need anAPI key for this ?
@tonykipkemboi Před 2 měsíci
Nope, don't need one.
@sebinanto3733 Před 2 měsíci
hey,if we are using google colab,instead of jupyter,how will we able to corporate ollama with google colab?
@tonykipkemboi Před 2 měsíci
I haven't tried this myself but here are some resources for you that might be helpful;
1. medium.com/@neohob/run-ollama-locally-using-google-colabs-free-gpu-49543e0def31
2. stackoverflow.com/questions/77697302/how-to-run-ollama-in-google-colab
@spotnuru83 Před 2 měsíci ⁺¹
firstly thank you for sharing this entire tutorial, really great, i tried to implement it and got all the issues resolved but looks like i am not getting any output after i ask any question. i am OllaEmbeddings: 100% 5 times and then nothing happened after that, program just quit without giving any answer. will you be able to help me in this regards to see how to get it worked?
@tonykipkemboi Před 2 měsíci
Thank you for your question. Did you use the same models as in the tutorial, or did you use another one? Are you able to share your code?
@spotnuru83 Před 2 měsíci
@@tonykipkemboi i copied ur code exactly
@spotnuru83 Před 2 měsíci ⁺²
the reason was i did not use jupiter notebook, i was running in VSCode and i had to save the value that is returned by chain's invoke method and when i printed , it started working, this is amazing.. thank you so much. really appreciate it.
@aalamansari8643 Před 19 dny ⁺¹
is it possible using this we can extract data from pdf and convert to proper JSON format?
@tonykipkemboi Před 16 dny ⁺¹
Yes, it is possible. You would need to add another function to do that but vewry doable. I'd start by checking LnagChain docs on JSON extraction and using Pydantic.
@aalamansari8643 Před 16 dny
@@tonykipkemboi got it!
@ayushmishra5861 Před 2 měsíci ⁺¹
Retrieving answers from vector database takes good one minute on my macbook air, how do I scale this model, can you add pinecone layer to it?
@tonykipkemboi Před 2 měsíci ⁺¹
So this was a demonstration of running with everything local and nothing online other than when downloading the packages. You can hook up any vector store you like for example Pinecone as you've mentioned. Just beware that since the local models will still be in use, it will still be slow if your system is slow already. Might consider using paid services if you're looking for a lower latency solution.
@ayushmishra5861 Před 2 měsíci ⁺²
@@tonykipkemboi So tony, what I am trying to build is something like a website, where people come and drop there pdf's and can do Q and A.
In my learning and implementation I found out. My 10 page pdf embedding generation is not taking a lot of time, it used to before using the embedding model you used.
Now embedding part is sorted.
I tried implementing the code with chroma and faiss, results are almost equal. Even for a small sized pdf, it takes a minute to answer.
I understand it takes computational resource from my local machine, which happens to be a Macbook Air M1.
Do you the a machine with better GPU, lets assume yours produce the retrieved results under 10 seconds?
Nobody would like to wait a minute or more than a minute on website for an answer, also I am scared about the part if there are 100's of 1000's of user, do I need to purchase a GPU farm for this to work, lol.
Note- I have never made a scalable project before.
Please guide. Also share how much time it takes on your Pc/laptop for the answer to come back from the vector db, so I can understand if it's my system which is weak or libraries like chroma and faiss are not meant for scalability.
@ayushmishra5861 Před 2 měsíci
@@tonykipkemboi .
@ayushmishra5861 Před 2 měsíci
can anyone answer this please?
@tonykipkemboi Před 2 měsíci
@@ayushmishra5861 so my system is just like yours with 16GB RAM. It takes about a minute or less to get an answer back for a few pdf pages embedded. For longer ones, it even takes longer. One portion that slows the process is the "multiqueryretriever" which I added and talked about in the video. It generates 5 more questions and those have to get the context from the vector db as well which slows down the time to output significantly. Try without the multiqueryretriever and see if that speeds up your process.
@gilleslejeune6823 Před 19 dny ⁺¹
Thanks, i dont see where you can tell to handle other langage than English ?
@tonykipkemboi Před 16 dny
It would largely depend on the capabilities of the given model to translate from English to the target language. You can try by adding the target language in the prompt. Tell it to return the results in X language.
@LumpBrady0 Před 8 hodinami ⁺¹
I am getting this error when trying to run it in a jupyter notebook. Any idea how to fix this??
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
chromadb 0.4.7 requires pydantic=1.9, but you have pydantic 2.8.0 which is incompatible.
fastapi 0.99.1 requires pydantic!=1.8,!=1.8.1,=1.7.4, but you have pydantic 2.8.0 which is incompatible.
@tonykipkemboi Před 7 hodinami
@@LumpBrady0 could you paste the entire error log here?
@LumpBrady0 Před 4 hodinami
@@tonykipkemboi how do I get to the error log? (sorry, I'm pretty new here)
@anz918 Před 8 dny ⁺¹
I have a 1000 Page PDF will it be able to go through that
@tonykipkemboi Před 8 dny
Good question. I haven't tried it but my naive guess is it can handle it.
@makethebestgame4868 Před měsícem
I got this error when running your code on colab: "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
imageio 2.31.6 requires pillow=8.3.2, but you have pillow 10.3.0 which is incompatible." Could you help me to check?
@tonykipkemboi Před měsícem ⁺¹
The error message indicates a conflict between the versions of `imageio` and `Pillow` packages. Here's how you can resolve this issue:
1. **Uninstall the current version of Pillow:**
```bash
!pip uninstall pillow -y
```
2. **Install the compatible version of Pillow required by imageio:**
```bash
!pip install pillow==10.0.0
```
3. **Reinstall imageio to ensure all dependencies are correctly aligned:**
```bash
!pip install imageio --upgrade
```
Here’s how you can run these commands in a Colab cell:
```python
!pip uninstall pillow -y
!pip install pillow==10.0.0
!pip install imageio --upgrade
```
This sequence will uninstall the conflicting version of `Pillow`, install a compatible version, and ensure `imageio` is up to date. This should resolve the dependency conflict you are encountering. Let me know if it works.
@makethebestgame4868 Před měsícem ⁺¹
@@tonykipkemboi tks a lot
@ayushmishra5861 Před 2 měsíci ⁺²
can I do this on google colab?
@tonykipkemboi Před 2 měsíci ⁺²
This is local using Ollama so not possible following this specific tutorial. you can however use other public models that have API endpoints that you can call from Colab. I also want to mention that I have not explored trying to access the local models through Ollama using Colab.
@suryapraveenadivi851 Před 2 měsíci ⁺¹
PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? please help with this........
@tonykipkemboi Před 2 měsíci
Are you doing a different modification of the code in the tutorial or using OCR?
I would checkout the install steps on their repo here (github.com/Belval/pdf2image) and probably use ChatGPT for debugging as well.
@levsigal6151 Před 2 měsíci
@@tonykipkemboi I've got the same error and I am using PDF file. Please advise.
@anuarxz_ Před 6 dny ⁺¹
Hi bro, good video!! But in my console I only see this: OllamaEmbeddings: 100% and stops automatically
@tonykipkemboi Před 6 dny
Thank you! Does it show anything on the app?
@suryapraveenadivi851 Před 2 měsíci ⁺¹
ERROR:unstructured:Following dependencies are missing: pikepdf, pypdf. Please install them using `pip install pikepdf pypdf`.
WARNING:unstructured:PDF text extraction failed, skip text extraction... please help
@tonykipkemboi Před 2 měsíci
Have you tried installing what it's asking for `pip install pikepdf pypdf`?
@suryapraveenadivi851 Před 2 měsíci ⁺¹
@@tonykipkemboi Thank you so much!! for your reply this got resolved..
@tonykipkemboi Před 2 měsíci
@@suryapraveenadivi851 glad it worked! Happy coding.
@Justme-dk7vm Před 23 dny ⁺²
i installed ollama, and verified on powershell of my windows laptop,when i ran "!ollama pull nomic-embed-text" it is showing "
/bin/bash: line 1: ollama: command not found" PLEASE HELP ME, ONLY YOUR VIDEO ON THE WHOLE CZcams IS SAVING MY LIFE, PLEASE REPLY AS SOON AS POSSIBLE
@tonykipkemboi Před 23 dny ⁺¹
So it seems to be an issue with Ollama installation on Windows. I haven't tried installing Ollama on Windows but might be a good time to add a tutorial on that, maybe. Have you tried watching other tutorials or docs on how to set up Ollama on Windows?
@Justme-dk7vm Před 23 dny ⁺²
@@tonykipkemboi okay that’s kind of you.
The problem is not with installation I guess, im successfuly running on powershell and command prompt. The message is appearing on colab notebook.
@tonykipkemboi Před 23 dny ⁺¹
@@Justme-dk7vm ah I see. So you're using it in colab instead of "Jupyter Lab" locally?
I would suggest starting with using it on Jupyter Lab. You just need to install it using "pip install jupyterlab". I haven't ran it on colab but am sure it's possible.
@Justme-dk7vm Před 23 dny
@@tonykipkemboi Okay thankyou so much. I was just scrolling through your videos, it amazed me, you are damn Sir ❤️I would love to get connected with you on linkedin, could you please provide the link.
@Justme-dk7vm Před 22 dny
@@tonykipkemboi hey I tried on Jupyter lab today as you said, I'm not getting that error like previous. But when I entered a query, its taking so much time to load. How to resolve this?
@muhammedyaseenkm9292 Před 2 měsíci
How can download unstructured[all-doc]
I can not install this,
@tonykipkemboi Před 2 měsíci
Did you install it like this '!pip install --q "unstructured[all-docs]"'
@fahaamshawl9335 Před 2 měsíci ⁺¹
Is this scalable?
@tonykipkemboi Před 2 měsíci
To some extent but also your system setup and configuration is a major limiting factor

Další v pořadí

Automatické přehrávání

How to create the ULTIMATE Ollama UI app with Streamlit