How to chat with your PDFs using local Large Language Models [Ollama RAG]
Vložit
- čas přidán 3. 07. 2024
- In this tutorial, we'll explore how to create a local RAG (Retrieval Augmented Generation) pipeline that processes and allows you to chat with your PDF file(s) using Ollama and LangChain!
✅ We'll start by loading a PDF file using the "UnstructuredPDFLoader"
✅ Then, we'll split the loaded PDF data into chunks using the "RecursiveCharacterTextSplitter"
✅ Create embeddings of the chunks using "OllamaEmbeddings"
✅ We'll then use the "from_documents" method of "Chroma" to create a new vector database, passing in the updated chunks and Ollama embeddings
✅ Finally, we'll answer questions based on the new PDF document using the "chain.invoke" method and provide a question as input
The model will retrieve relevant context from the updated vector database, generate an answer based on the context and question, and return the parsed output.
TIMESTAMPS:
============
0:00 - Introduction
0:07 - Why you need to use local RAG
0:52 - Local PDF RAG pipeline flowchart
5:49 - Ingesting PDF file for RAG pipeline
8:46 - Creating vector embeddings from PDF and store in ChromaDB
14:07 - Chatting with PDF using Ollama RAG
20:03 - Summary of the RAG project
22:33 - Conclusion and outro
LINKS:
=====
🔗 GitHub repo: github.com/tonykipkemboi/olla...
Follow me on socials:
𝕏 → / tonykipkemboi
LinkedIn → / tonykipkemboi
#ollama #langchain #vectordatabase #pdf #nlp #machinelearning #ai #llm #RAG - Věda a technologie
Im a medical researcher and, surprisingly, my life is all about pdfs i dont have any time to read; let alone learn the basics of code. And i think there's a lot of people on the same boat as mine. Unfortunately, its very fucking hard to actually find an ai tool thats barely reliable. Most of youtube is damped with sponsors for ai magnates trying to sell their rebranded and redudant worthless ai-thingy for a montlhy subscription or an unjustifiably costly api that follows the same premise. The fact that you, the only one that came closer to what i actually need - and a very legitimate need - is a channel with
Thank you so much for sharing about the pain points you're experiencing and the solution you're seeking. I'd like to be more helpful to you and many more like you as well. I have an idea of creating a UI using Streamlit for the code in this tutorial with a step-by-step explanation of how to get it running on your system. You will essentially clone the repository, install Ollama and pull any models you like, install the dependencies, then run Streamlit. You'll then be able to upload PDFs on the Streamlit app and chat with it on a chatbot like interface. Let me know if this will be helpful. Thanks again for your feedback.
hey, hmu and ill give you my rag that supports multiple pdfs and you can choose the llm you desire to use.
I'm in the space as well, and am trying to find the best way to parse PDFs. I've setup grobid on docker and tried that out. My work laptop is a bit garbage, and being in the world's largest bureaucracy, procuring hardware is a pain in the ass. Anyways, great video.
USe nvidia RTX chat for pdf summarizing and querying. Purchase a cheap RTX card of minimum 8GB vRAM.
@@tonykipkemboiI think most people are in pain now with just this part "upload pdfs to service X". This is what they want/have to avoid. Anyhow, nice video you made here.
Thank You. I have done several similar projects and I learn something new about 'local RAG' with each one !
Thank you for this excellent intro. You are a natural teacher of complex knowledge and this has certainly fast-tracked my understanding. I'm sure you will go far and now you have a new subscriber in Australia. Cheers and thank you - David
Glad to hear you found the content useful and thank you 🙏 😊
You are a awesome teacher, thank you so much to explain this in a clean and objective way :)
🙏
Top-tier information here. Thank you!
🙏
this was super clear, extremely informative, and was spot on with the exact answers I was looking for. Thank you so much.
Glad you found it useful and thank you for the feedback!
Welcome on my special list of channels I subscribe to. Looking forward to you making me smarter😊
Thank you for that honor! I'm glad to be on your list and will do my best to deliver more awesome content! 🙏
Dope video man! Keep them coming
Appreciate it!!
Clear instruction, excellent tutorial. Thank you Tony!
Thank you for the feedback and glad you liked it! 😊
You're welcome Ezekiel!
Congrats man. Really useful content. Well explained and effective.
Thank you, @ISK_VAGR! 🙌
Simple and well illustrated, Arap Kemboi 👍🏾👍🏾👍🏾
Asante sana bro! 🙏
This is a fun and potent project. This provides access to a powerful space. Peace be on you.
Thank you and glad you like it!
Thats a pretty clean explanation.
looking for more videos.
Thank you! Glad you like the delivery. I got some more cooking 🧑🍳
Very helpful! Great video! 👍
🙏❤️
Very good! Easy to understand, easy to try, expandable ....
Awesome! Great to hear.
@@tonykipkemboi you deserve it. Too many LLM CZcamsrs are more concerned to show a lot of things than to make them easy to understand and to reproduce. Keep up the great work!
Nice job, thanks Tony!
🙏
This is a great tutorial. Thank you
🙏
Thanks for this amazing tutorial on building a local LLM. I applied it to my research paper PDFs, and the results are impressive.
Awesome 🤩 Love to hear that! Did you experiment without using the MultiQueryRetriever in the tutorial to see the difference?
@@tonykipkemboi That's an interesting question. I tried and found that MultiQueryRetriever works well in general, when LLM needs to connect indirect information from document, but fails to provide relevant information for direct information present in the document. But, this observation could differ case to case.
thanks man this is extremely helpful!
🙏🫡
Thank you for sharing good content
🙏
Great job
Thank you! 🙏
thanks for this tony
🙏
Super!
🙏
Good one, Good luck🤞
Thanks ✌️
Useful tip : use a proper wifi dont use Mobile hotspot while pulling the model from ollama ,i had a error with that ,hopes it helps someone😊
nicely done
Thank you 😊
awesome content! new sub
Thank you! 🙏
Good to see fellow Kenyans on AI. Perhaps the Ollama WebUI approach would be easier for beginners as one can attach a document, even several documents to the prompt and chat.
🙏 Yes, actually working on a Streamlit UI for this
so cool!
Thank you 🙏
Really userful content and well explained. t would be interesting to see a video but with different types of files, like only PDFs, for example Markdown, PDF, and CSV all at once. It would be very interesting.
Thank you! I have this in my content pipeline.
Good video 👍👍👍
Nice
Thank you!
Can you make one video of RAG using Agents? Great video btw. Thanks
Sure thing. I actually have this in my list of upcoming videos. Agentic RAG is pretty cool right now and will play with it and share a video tutorial. Thanks again for your feedback.
I was planning on doing this as a project. If you beat me to it, I can compare notes
very detailed explanation, thanks, can you please make the same project to give responses in multi-language and with voice output?
Thank you. Yes that would be cool. I can see the challenge coming from finding an open source model that is good at multiple languages. The ones I used are not great at all. For voice, it'd probably be easy to use an open source TTS or even be more granular and use 11labs for a better quality in spite of it not being local.
🤩🤩
Great delivery of material. How about fine-tuning for llama3 using your own curated dataset as a video? There are some out there, but your teaching style is very good.
Thank you and that's a great suggestion!
I'll add that to my list.
Your video is excellent, you gained a subscriber!
I'm looking to move all of my more than 500 project documentation files into a GPT to help resolve support issues and answer questions from auxiliary teams, I can see this being exactly what I needed.
Do you know someone who is trying to approach project documentation with LLMs templates?
Thank you, big hug from Brazil!
So glad you found it helpful and thank you for subscribing as well! 💜
Can expand more on the "documentation with LLMs templates"?
Thanks, Can you please explain one by one and slowly. Especially the RAG part
Thanks for asking. Which part of the RAG pipeline?
Good Explanation. could you please make video, If PDFs has images and tables in it. How would we extract , Store and RAG on images, tables and text using open source models
This is a good topic to explore. I might just create another video diving deeper into pdf types and how to extract and use multimodal elements.
Wonderful tutorial, man! Let me ask you, what are the other kinds of prompts we can use? Also, is it normal for the rag to answer questions about things not on the pdf that was loaded? For example, i tested with the prompt "what is a dog" and got a answer back. Is it because of the RAG and Ollama? Thanks a bunch
Thanks for the share. Quite enlightening. I will def build upon that. Here is the problem I have. Let's say Ihave two documents and I wanna chat with both at the same time (for instance to extract conflicting points between the two). What would you advise here?
Thank you! That's an interesting use case for sure. My instinct before looking up some solutions is to maybe create 2 separate collections for each of the files then retrieve them separaetly and chat with them for comparison. I'm sure my suggestion above might not be efficient at all. I will do some digging and share any info I find.
Would love it if can make the streamlit app! I am still struggeling to make a streamlit app based on open source llms
Thank you! Yes, I'm working on a Streamlit RAG app.
I have released a video on Ollama + Streamlit UI that you can start with in the meantime.
@@tonykipkemboi thanks bro! I will defo watch👌
from the Kenyan homeland
Kabisa bro 😎
@@tonykipkemboi am, vouching for u bro, kitu yoyote mpya about Ai, LLMs, etc ikitokea, we weka hapa asap, we're fully behind you
great walkthrough, the audio can be increased a little bit...
Thank you! 😊 I noticed that I didn't adjust my gain after I had posted. Thanks for your feedback.
Good stuff. Shame you didn't run the notebook. Would like to see how it works.
Thank you. I tried recording and running the notebook but it killed my video recording since they were competing for system resources with Ollama. I ran the notebook as you can see the outputs in it already and just walked through the code. I'll try running it on the next video for more interactivity.
You are a legend 🫡
Thank you !!!
❤️🫡
Thanks a lot! If we have a mix of multiple PDFs, Words or Excel files, how can we change the RAG to support retrieval of them?
Glad you found it helpful. For different file types, you would consider the loading/parsing and chunking strategies that fit those data types. I'm working on the next video which I will go over CSV & Excel RAG.
What GPU do you use ? I have Ollama running on an i5 intel with integrated CPU and so unable to use any of 3B + models. TinyLama and TinyDolphin works but the accuracy is way off
I have an Apple M2 with 16GB of memory. I noticed that larger models slow down my system and sometimes force a shutdown of everything. One way around it is deleting other models you're not using.
Congrats on your Video! In your example you use just one PDF, I have a demand to work with thousands of documents, and the main issue is the time consumption to upload the videos. Can you give me some advice?
Did you mean to say it takes time to upload the documents to vector store and query over them? If yes, I do agree with you that latency is an issue especially since we're adding another layer of retrieval using the MultiQueryRetriever. It would also depend on your system as well if you're using Ollama.
Great job. Does the file you chat with have to be a PDF or can it be a CSV or other structured file type?
🙏 thank you. I'm actually working on a video for RAG over CSV. The demo in this tutorial will not work for CSV or structured data; we need a better loader for structured data.
thanks for the tutorial ! how can I make the model to give answers in a different language?
It would largely depend on the capabilities of the given model to translate from English to the target language. You can try by adding the target language in the prompt. Tell it to return the results in X language.
Great video! Thanks for sharing. I ran into an issue with a Chroma dependency on SQLite3 (i.e. RuntimeError: Your system has an unsupported version of sqlite3. Chroma requires sqlite3 >= 3.35.0). The suggested solutions are not working. Is it possible to use another DB in place of Chroma?
Thank you! Yes, you can swap it with any other open-source vector database. You might also try using a more recent version of Python, which should come with a newer version of SQLite. Do you know what version you are using now?
You can also try installing the binary version in the notebook like so: `!pip install pysqlite3-binary`
Hello friend, thank you very much for your content. I have a question, how can I make it listen to my server within Google Collab so I don't have to use Jupyter, since my resources are a bit limited?
Hello ! nice tutorial. I was stuck on the first part unfortunately as I get the error:
"Unable to get page count Is poppler installed and in PATH".
Do you have any idea how to solve this ?
I have already installed poppler using brew.
Thank you. Have you tried using chatgpt to troubleshoot?
Appreciate your work, wanted to know can i use it for confidential pdf. is there will be any chances of data leak ??
Thank you for the kind words. Yes, if you use Ollama models like we did on the video, then your content will stay private and not be sent to any online service. To be sure, I'd recommend turning off your WiFi or any connection once you've loaded all the dependencies and imports. You can then run the cells to lead your PDF to a vector db and chat with it. After you're done, you can delete the collection where you saved the vectors of your PDF before turning your connection back on. This is an extra measure to give you peace of mind.
Nice video, and very informative.
My question: I have downloaded the LLMs like gemma, llama2, llama3 and so on on my MacOS. But due to some technical issue, I deleted these LLMs. ( e.g: $ ollama rm llama2)
Now I want them again, and noticed that if I run "$ ollama run llama3", this **downloads the entire 4.7GB from the internet** over again.
Is it possible to keep them downloaded at some place and when I want it - just run $ ollama run and use it and later delete it when not needed ?
Again Thanks in advance and would appreciate a response.
Thank you. What you did earlier is the standard way of downloading, serving, and deleting the Ollama models.
You can also download more quantized options for each, with less memory. I usually add and then delete whenever I don't need it or when I need to download another model.
you're initial ingestion, it doesn't load the first page, it ingests the entire document. Your data variable consists of a list of a single Document object, that will contain the content of the entire pdf
That is correct. I did not change the code after testing it previously with loading individual pages. You can load by page and add metadata that way.
@@tonykipkemboi but cool tutorial for summarisation using a multi query retriever. I didn't know this was a thing in langchain
@@madhudson1 thank you. Yes, it's a neat function
Are the libraries you used (langchain , chromaDB ...) open source? and can we use any ollama model?
yes and yes
Good one.. ok you touched on security- you have here something that doesn’t let things flow out to the internet. I saw a bunch of vids about tapping data from dbs using sql agents. But none said specifically anything about security. So qn- does using sql agents violate data security?
You bring up a critical point and question. Yes, I believe most agentic workflows currently, especially tutorials, lack proper security and access moderation. This is a growing and evolving portion of agentic frameworks + observability, IMO. I like to think of it as people needing special access to databases at work and someone managing roles and the scope of access. So agents will need some form of that management as well.
Thanks. Btw, how did you make your your CZcams profile photo? It looks very nice.
Thank you! 😊
I used some AI avatar generator website that I forgot but I will find it and let you know.
Thank you
could you drop a tutorial on building rag chatbots with ollama and langchain with custom data and guard-railing?
That sounds interesting and something I'm looking into as well. For guard-railing, what are your thoughts on the frameworks for this portion? Have you tried any?
@@tonykipkemboi realpython.com/build-llm-rag-chatbot-with-langchain/
I've reas this article, and the only guard-railing mech they seem to apply is an additional prompt with every inference.
Quite interesting and thanks for sharing it, can you let me know if this would run on 32GB CPU RAM Core i7 processor? Considering you are using mistral model
Thank you. Yes that should be sufficient to run the program.
Thanks for sharing this. Very helpful. Also, what are you using for screen recording and editing this video ? I see that it records the section where your mouse cursor is ! Nice video work as well. Only suggestion is to increase gain in your audio
I'm glad you find it very helpful. I'm using Screen Studio (screen.studio) for recording; it's awesome!
Thank you so much for the feedback as well. I actually reduced it during editing thinking it was too loud haha. I will make sure to readjust next time.
@@tonykipkemboi Btw, can you see those 5 questions that it generated before summarizing the document?
@@xrlearn, I'm sure I can. I will try printing them out and share them here with you tomorrow.
Hi @xrlearn - Found a way to print the 5 questions using `logging`. Here's the code you can use to print out the 5 questions:
```
import logging
logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)
unique_docs = retriever.get_relevant_documents(query=question)
len(unique_docs)
```
Here are more detailed docs from LangChain that will help.
python.langchain.com/docs/modules/data_connection/retrievers/MultiQueryRetriever/
hey, thanks for this. Question. Does it have limitations on the number of documents one can upload to chat with? Like can I upload thousands of documents to use?
I haven't tested it with many documents but will do.
Will appreciate a lot. Much love from Kenya btw😃
@@AfrivisionMediake 🫡
Should I make Chroma DB connection to make this work?
We do use Chroma in the tutorial.
good job ,thanks a lot .I have a question :how i do chatbot ,and with wich technologies :RASA , langchain ,ollama ......
if i do train it with my data (scaping ,pdf ....)
I don't think am following your question correctly. Are you asking how to build a chatbot for the RAG in this video?
How can we get output without rephrasing? I mean i want to know what exactly written in PDF as it is. for example if i say what is written in article 3.2.2 and output will be in quotes word to word?
Ah yes, good idea. I think for this, you'll have to add citations. I'm early into playing with this as I am working on the Streamlit UI for RAG. Always good to have cited sources.
Is it possible to upload multiple PDF documents using the langchain doc loaders and then converse across them? Excellent tut and thanks - David
That can definitely be possible. Are you thinking of probably two pdfs that each carry different content?
@@tonykipkemboi Thank you for taking the time to reply - much appreciated.
I was just wondering whether this approach allows the ingesion of multiple documents which could be contrasted or used in conjunction with each other.
Cheers mate - David
Pls provide notebook if possible. great video.
Thank you! Checkout the repo link in the description for all the code.
Here's the link github.com/tonykipkemboi/olla...
@@tonykipkemboi hey, the link is not working, can provide it again pls?
no problem, didnt see the description, thanks!
@@hectorelmagotv8427 , thanks. Just to confirm, did it work?
Hi, and if there were 6 or 10 PDFs, how would you load them into the RAG? Thanks
Good question! I would iterate through them while loading them and also index the metadata so it's easy to reference which pdf provided the context for the answer. There's actually several ways of doing this but that would be my simple first try.
Can this model query with tabular data or image data, can't it?
I assume you're talking about Llama2? Or are you referring to the Nomic text embedding model? If it's Llama2, it's possible to use it to interact with tabular data by passing the data to it (RAG or just pasting data to the prompt) but cannot vouch for its accuracy though. Most LLMs are not great at advanced math but they're getting better for sure.
Thanks again for the tutorial. I am running the same question against a 500 pages pdf multiple times and I am getting different answer everytime I ran. What can be going wrong here? I simply have a for loop and looping through the exact same question using the same vector db but yet getting different answers.
Thanks. Are the answers hallucinated in all of them of just the wording is different each time?
@@tonykipkemboi They are not completely off but they are a bit different. The pdf I am using is about medical terminology. I asked it simply to tell me the components of the cardiovascular system. There is a simple paragraph in it that lists them but yet, one time it talks about the kidneys.. others talks about the heart anatomy... so it is not completely hallucinating but it is not able to nail down a consistent answer
@@ammardarkazanli5633 one way I can think to solve this is by using the "seed" parameter for the model. You will need to create a modelfile with Ollama model you're using as the LLM so it generates the same output for the prompt. Here's the docs on how to create that
github.com/ollama/ollama/blob/main/docs/modelfile.md. You can also watch my other video on creating Ollama UI with Streamlit to see how I implemented the modelfile although I didn't add seed but it's easy to add.
I will give this a try…
@tonykipkemboi, Thank you very much for valuable video. It helped me a lot.
I was struggling to get the right LLM that can run locally.
I have a question: How do I create a persistent RAG so that the query results can be faster.
@@bhagavanprasad glad you found it useful. For this example, the speed depends on several factors one major one being your system configuration. If you have a GPU, then it will be much faster. An intermediate step would be to remove the MultiQueryRetriever since that generates more questions from your prompt then retrieve context for all the questions from the vectoredb which takes time and introduces latency. You can use the generic one question query then optimise retrieval another way like using a reranking model. But that might also be a bit more than what we covered in this tutorial. There's definitely a trade off where you sacrifice accuracy for speed and vice versa.
Can we do this with llama3 , which will be more good?
Yes you can use llama3.
Thank you very much for your videos.
Please, what's if we have severals PDFs?
Yes, so you can iteratively load the pdfs, chunk them by page or something else, then index them in a vector database. You would then ask your query like always and it would find the context throughout all the documents to give you an answer.
Oh I thought you were saying you've embedded an LLM into a PDF document, like those draggable 3d diagrams.
I did some first experiments with local AI, using Ollama and AnythingLLM to talk to the model about a pdf file... and so far, the results are just completely unusable. The AI is just hallucinating on me constantly, making up sentences in the pdf that are not there, failing simple tasks like "quote the first line on page 2 without changing it", not to mention more complex tasks like "list all tools mentioned on page 3". Maybe I'm doing something wrong, but I feel very discouraged from using AI at all for this kind of usecase.
Sorry to hear the troubles but this is very common. Have you tried setting the temperature of the model to 0? That way there's no room for it to be creative.
@@tonykipkemboi Interesting, I'll look into that thanks!
@@user-eh2zd2ih8v let me know what comes of it.
Nice video. When i try to execute the following commands: !ollama pull nomic-embed-text and !ollama list. I receive the following error: /bin/bash: line 1: ollama: command not found
This error is means that Ollama is not installed on your system or not found in your system's PATH. Do you have Ollama already installed?
@@tonykipkemboi Hello, I've installed ollama in my local system but i don't know why i'm getting an error in google colab
I encountered several errors when trying to execute the following line in the code:
data = loader.load()
Despite installing multiple modules, such as pdfminer, I'm unable to resolve an error stating "No module named 'unstructured_inference'." Has anyone else experienced similar issues with this code? Any assistance would be greatly appreciated. Thank you!
Interesting that's asking for that since that's for layout parsing and we didn't use it. Try installing it like so; "!pip install unstructured-inference"
I've been given a story, the trojan war which is a 6 page pdf or I can even use the story as a text , also 5 pre decided question is given to ask based on the story, I want to evaluate different models answers but I am failing to evaluate even one, kindly help, please guide thoroughly.
Can you please reply, would really appreciate that.
This sounds interesting! I believe if you're doing this locally, you can follow the tutorial to create embeddings of the PDF and store it in a vector db then use the 5 questions to generate output from the models. You can switch the model type in between each response and probbly have to save each response separately so you can compare them afterwards.
@@tonykipkemboi What amount of storage will the model take.
I don't have greatest of the hardware.
Yes, there are smaller quantized models on Ollama you can use, but most of them require a sizeable amount of RAM. Check out these instructions from Ollama on the size you need for each model. You can also do one at a time, then delete the model after use to create space for the next one you pull. I hope that helps.
github.com/ollama/ollama?tab=readme-ov-file#model-library
chromadb works with sqllite 3. facing lot of issues using chroma. can we use any other db or just pcl the entire vector db
You can definitely replace chroma with any other db like Weaviate or Qdrant or Milvus and so on.
Thanx man ! It worked 👌
@@nitinkhanna9754 awesome!
@@nitinkhanna9754 What other DB did you use to make it work, as suggested by @tony.
I tried the first command %pip install -q unstructured langchain and its taking a super long time to install. Is this normal?
It shouldn't take more than a couple of seconds but also depending on your system and package manager, it might take a while. Did it resolve?
What is the python version you used for running this poc
Python 3.9
Can you think of a reason why pip install unstructured[all-docs] is failing on my two macs. I get the error that "Failed to build onnx
ERROR: Could not build wheels for onnx, which is required to install pyproject.toml-based projects".. I have tried almost every suggestion on the internet. I am attempting to run on python 3.12.1 and 3.12.3 ... Thanks
I had the same issue at some point. Switching to Python 3.9 resolved the error for me. Create a virtual environment with 3.9 and try running it there.
@@tonykipkemboi Just to confirm, everything worked well with 3.9.19.. Thanks for the suggestion. The video was very helpful to get a handle with all the commotion around the different models.
@@ammardarkazanli5633 glad to hear it worked!
E do not need anAPI key for this ?
Nope, don't need one.
hey,if we are using google colab,instead of jupyter,how will we able to corporate ollama with google colab?
I haven't tried this myself but here are some resources for you that might be helpful;
1. medium.com/@neohob/run-ollama-locally-using-google-colabs-free-gpu-49543e0def31
2. stackoverflow.com/questions/77697302/how-to-run-ollama-in-google-colab
firstly thank you for sharing this entire tutorial, really great, i tried to implement it and got all the issues resolved but looks like i am not getting any output after i ask any question. i am OllaEmbeddings: 100% 5 times and then nothing happened after that, program just quit without giving any answer. will you be able to help me in this regards to see how to get it worked?
Thank you for your question. Did you use the same models as in the tutorial, or did you use another one? Are you able to share your code?
@@tonykipkemboi i copied ur code exactly
the reason was i did not use jupiter notebook, i was running in VSCode and i had to save the value that is returned by chain's invoke method and when i printed , it started working, this is amazing.. thank you so much. really appreciate it.
is it possible using this we can extract data from pdf and convert to proper JSON format?
Yes, it is possible. You would need to add another function to do that but vewry doable. I'd start by checking LnagChain docs on JSON extraction and using Pydantic.
@@tonykipkemboi got it!
Retrieving answers from vector database takes good one minute on my macbook air, how do I scale this model, can you add pinecone layer to it?
So this was a demonstration of running with everything local and nothing online other than when downloading the packages. You can hook up any vector store you like for example Pinecone as you've mentioned. Just beware that since the local models will still be in use, it will still be slow if your system is slow already. Might consider using paid services if you're looking for a lower latency solution.
@@tonykipkemboi So tony, what I am trying to build is something like a website, where people come and drop there pdf's and can do Q and A.
In my learning and implementation I found out. My 10 page pdf embedding generation is not taking a lot of time, it used to before using the embedding model you used.
Now embedding part is sorted.
I tried implementing the code with chroma and faiss, results are almost equal. Even for a small sized pdf, it takes a minute to answer.
I understand it takes computational resource from my local machine, which happens to be a Macbook Air M1.
Do you the a machine with better GPU, lets assume yours produce the retrieved results under 10 seconds?
Nobody would like to wait a minute or more than a minute on website for an answer, also I am scared about the part if there are 100's of 1000's of user, do I need to purchase a GPU farm for this to work, lol.
Note- I have never made a scalable project before.
Please guide. Also share how much time it takes on your Pc/laptop for the answer to come back from the vector db, so I can understand if it's my system which is weak or libraries like chroma and faiss are not meant for scalability.
@@tonykipkemboi .
can anyone answer this please?
@@ayushmishra5861 so my system is just like yours with 16GB RAM. It takes about a minute or less to get an answer back for a few pdf pages embedded. For longer ones, it even takes longer. One portion that slows the process is the "multiqueryretriever" which I added and talked about in the video. It generates 5 more questions and those have to get the context from the vector db as well which slows down the time to output significantly. Try without the multiqueryretriever and see if that speeds up your process.
Thanks, i dont see where you can tell to handle other langage than English ?
It would largely depend on the capabilities of the given model to translate from English to the target language. You can try by adding the target language in the prompt. Tell it to return the results in X language.
I am getting this error when trying to run it in a jupyter notebook. Any idea how to fix this??
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
chromadb 0.4.7 requires pydantic=1.9, but you have pydantic 2.8.0 which is incompatible.
fastapi 0.99.1 requires pydantic!=1.8,!=1.8.1,=1.7.4, but you have pydantic 2.8.0 which is incompatible.
@@LumpBrady0 could you paste the entire error log here?
@@tonykipkemboi how do I get to the error log? (sorry, I'm pretty new here)
I have a 1000 Page PDF will it be able to go through that
Good question. I haven't tried it but my naive guess is it can handle it.
I got this error when running your code on colab: "ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
imageio 2.31.6 requires pillow=8.3.2, but you have pillow 10.3.0 which is incompatible." Could you help me to check?
The error message indicates a conflict between the versions of `imageio` and `Pillow` packages. Here's how you can resolve this issue:
1. **Uninstall the current version of Pillow:**
```bash
!pip uninstall pillow -y
```
2. **Install the compatible version of Pillow required by imageio:**
```bash
!pip install pillow==10.0.0
```
3. **Reinstall imageio to ensure all dependencies are correctly aligned:**
```bash
!pip install imageio --upgrade
```
Here’s how you can run these commands in a Colab cell:
```python
!pip uninstall pillow -y
!pip install pillow==10.0.0
!pip install imageio --upgrade
```
This sequence will uninstall the conflicting version of `Pillow`, install a compatible version, and ensure `imageio` is up to date. This should resolve the dependency conflict you are encountering. Let me know if it works.
@@tonykipkemboi tks a lot
can I do this on google colab?
This is local using Ollama so not possible following this specific tutorial. you can however use other public models that have API endpoints that you can call from Colab. I also want to mention that I have not explored trying to access the local models through Ollama using Colab.
PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH? please help with this........
Are you doing a different modification of the code in the tutorial or using OCR?
I would checkout the install steps on their repo here (github.com/Belval/pdf2image) and probably use ChatGPT for debugging as well.
@@tonykipkemboi I've got the same error and I am using PDF file. Please advise.
Hi bro, good video!! But in my console I only see this: OllamaEmbeddings: 100% and stops automatically
Thank you! Does it show anything on the app?
ERROR:unstructured:Following dependencies are missing: pikepdf, pypdf. Please install them using `pip install pikepdf pypdf`.
WARNING:unstructured:PDF text extraction failed, skip text extraction... please help
Have you tried installing what it's asking for `pip install pikepdf pypdf`?
@@tonykipkemboi Thank you so much!! for your reply this got resolved..
@@suryapraveenadivi851 glad it worked! Happy coding.
i installed ollama, and verified on powershell of my windows laptop,when i ran "!ollama pull nomic-embed-text" it is showing "
/bin/bash: line 1: ollama: command not found" PLEASE HELP ME, ONLY YOUR VIDEO ON THE WHOLE CZcams IS SAVING MY LIFE, PLEASE REPLY AS SOON AS POSSIBLE
So it seems to be an issue with Ollama installation on Windows. I haven't tried installing Ollama on Windows but might be a good time to add a tutorial on that, maybe. Have you tried watching other tutorials or docs on how to set up Ollama on Windows?
@@tonykipkemboi okay that’s kind of you.
The problem is not with installation I guess, im successfuly running on powershell and command prompt. The message is appearing on colab notebook.
@@Justme-dk7vm ah I see. So you're using it in colab instead of "Jupyter Lab" locally?
I would suggest starting with using it on Jupyter Lab. You just need to install it using "pip install jupyterlab". I haven't ran it on colab but am sure it's possible.
@@tonykipkemboi Okay thankyou so much. I was just scrolling through your videos, it amazed me, you are damn Sir ❤️I would love to get connected with you on linkedin, could you please provide the link.
@@tonykipkemboi hey I tried on Jupyter lab today as you said, I'm not getting that error like previous. But when I entered a query, its taking so much time to load. How to resolve this?
How can download unstructured[all-doc]
I can not install this,
Did you install it like this '!pip install --q "unstructured[all-docs]"'
Is this scalable?
To some extent but also your system setup and configuration is a major limiting factor