Chat with CSV Streamlit Chatbot using Llama 2: All Open Source
Vložit
- čas přidán 29. 07. 2023
- In this exciting tutorial, I'll show you how to create your very own CSV Streamlit Chatbot using the powerful and open-source Llama 2 language model developed by Meta AI! The best part? It runs smoothly on a regular CPU machine, so no need for expensive hardware.
Throughout the video, I'll guide you step-by-step on building the chatbot, which can effortlessly retrieve information from a CSV file. To enhance its capabilities, we'll leverage sentence transformers for embeddings and utilize faiss CPU for efficient vector storage.
If you've ever wanted to dive into the world of chatbots and explore the potential of natural language processing, this tutorial is perfect for you. Join me and let's get hands-on with this exciting project!
Remember to like, comment, and subscribe to stay updated with more thrilling tutorials like this one. Happy coding!
AI Anytime's GitHub: github.com/AIAnytime
Previous Llama 2 Video (Medical Bot): • Build and Run a Medica...
Streamlit Docs: docs.streamlit.io/
Llama 2 Model HF: huggingface.co/TheBloke/Llama...
Sentence Transformers Docs: huggingface.co/sentence-trans...
Langchain Docs: docs.langchain.com/docs/
LLM Playlist: • Large Language Models
Streamlit Projects Playlist: • Streamlit Projects
LinkedIn Account: / sonukr0
WhatsApp Group of AI Anytime: chat.whatsapp.com/EDnAeyBL18G...
#generativeai #coding #python - Věda a technologie
Was waiting for the CSV video... Thank you.
Hope you enjoyed it!
@@AIAnytime i was aslo waiting for csv . thanks
Thanks for the information. I'm going to try it and I'll be telling you about it. Great job!
Sounds good, thanks.... Look forward to your findings.
Thank you so much for information and tutorial ! you really help us all understand everything better.
Glad to hear that! Thank you.
I ran the application on my machine. It worked perfectly =) Thank you.
Glad it helped! Please subscribe to the channel if you think it's worth it.
Hey @Marcelo I am facing Hugging Face 503 issue did you got that too? Could you help?
really interesting video and no one provides this much good content on CZcams
Thank you dear for your efforts and good content.
I appreciate your content and efforts. Once again thank you
Thank you so much 🙂
Excellent..Thanks for sharing
My pleasure
Superrrr, thanks for your valuable content....
You're most welcome. Thank you.
Thanks for sharing!
Very nice and informative. If you can show how you deploy and create web versions that would be highly helpful.
Hi, I have some deployment video on my channel. Plz check this out. That will help...
Good .. was waiting
Thanks Kamal.
Great video
Thank you so much for the video @aianytime! The app isn't running on my device tho. Its taking too much time to run
May I know why you didn't use an Agent? For example instead of CSVLoader, could we have used create_csv_agent(llm, path) ? Trying to understand the difference.
very good video, you don't beat around the bush, which i like very much concise and to the point. All of your examples show cpu what if we have gpu and would like to use it. What changes or the imports you will make to make it run on gpu and make it run faster. Please point out those elements in your videos. I remember compiling your code and notices one or two modules without gpu support. Please love to hear back from you
you deserve more followers
Thank you for your comment! Hopefully people will bless me with their support 🙏... Slowly and steadily.
Hi, how to use the downloaded model with an agent like "create_pandas_dataframe_agent" or "create_csv_agent" to be able to make more precise queries to the CSV data? Like "how many rows with column=3" or "give me the mean of column test". Because like you did these kind of questions do not work.
Hi bro, from past 1 month you are the one I am watching continuously on youtube for your great content, live coding and explanations and for your ideas of LLMs. You are the LLM guru for me. Thank you so much for everything.
I am having one idea on this. microsoft has released one library called guidance. So how can we guide this chat bot with that to show proper output on our data like not going beyond on our data to make its own answer and forcing it to extract the right information from our data. can you make one small video on it?
@AIAnytime
Very helpful video. Thanks for sharing.
Can you please create a video with Mistral 7B model or explain how to switch from Llama2 to Mistral in the code?
I have 3 videos on Mistral 7B. Check them out in the LLM playlists.
YOu hav saved me.
This is so informative and useful. Can you make a video for the same if you have pdfs and you have to make mcq questions and the answer should be like both a b or b c or c d likewise. Thank you
Yes, soon! Thank you.
sir can you tell us how we can make csv and excel file reader and also give predictions on any type of excel and csv file and after this we will able to extract the information in the form of pdf means value able pdf with information and save to local system if you said to do this project private i will pay to you also! all think look this video but with openai means with openapi key!
Thanks@@AIAnytime
yes you can use pypdfium2 to read pdfs
It is very nice, what is the maximum data point will support
I am seeing streamlit Running but notseeing any response in even 10 to 15 mins as well.. The response
takes longer time?
Excellent explaination can u explain how to connect to postgress database
Hi thank you. It might not be difficult to connect to a DB. You have authentication? Connect with it and get the data in a format that langchain can accept.
I tried with a dataset containing employee salary information. When I ask how many records are present in the dataset, model is giving an incorrect output. If I ask what is the minimum salary its giving incorrect answer as well. How to make the model give accurate result
Is it possible to apply association rules to the csv data and get the results, like the appriori algo does.
I'm impressed with this video, and I want to thank you for your hard work. Have you thought about making a video about DemoGPT?
Thanks for your comment! I will create if that excites me and not any other tool.
@@AIAnytime Sure, I hope it will :)
thanks for this informattive video .I have recreated with the same code & uploading the 2019 csv file while quering it is giving below error msg.
"Number of tokens (723) exceeded maximum context length (512)" what need to be done ?
Hi sir, thank you for all ur efforts, but also look at the issues faced by us in the comments... Pls respond to thr issues too and address it pls
Where will we store (the directory) the download LLM - downloaded model here
I am getting this error any idea of resolving this? ValidationError: 2 validation errors for CTransformers max_new_tokens extra fields not permitted (type=value_error.extra) temperature extra fields not permitted (type=value_error.extra)
Why we are not using CSV_Agent from the Langchain here , will it perform better than this
Hi, if i want to run this code on GPU, what shall i do?
Hi, how do I consult you I have few questions, I need to process data of multiple csv file of different users, the search should happen specific to user which i select from the drop down
Great video. I am getting only "I don't know" from the chatbot. Do you know why? 😂
How many seconds does it take to get response when there are million rows in data?
Can't we use 7B model from Meta for this?
how to display chart and ask visualization questions?
AxiosError: Request failed with status code 403 it gives when i uploada csv files
Instead .bin models can i work it with .gguf?
If I ask this chat What commands from the pandas python library to know China's GDP does it provide me with this type of information or not?
Your videos are very informative. I have a doubt to clarify regarding training the data. I have dataset in csv format which contains car details like name, brand, model, speification about car. One car information in a row with multiple columns. How to train this data in order for LLM to answer my questions like number of cars manufactured by TATA? etc...I see lot of examples using OpenAI. But when i try to use Llama2, the answers are not great. In this example also, when you ask number of countries available in 2019.csv, it is not giviing enough information. Can you please provide help to understand how to make LLM to work for these kind of data?
Look at my chat with a CSV video using Llama2.
It is taking 10 mins or for every response and response also not accurate. Is it same for all??
Same
Are the results accurate ?
Thank you for the detailed information. CSV loader works fine in single csv file however if i have a multiple csv's i tried with directory loader and see the accuracy of the result is very poor. we observed few more things during chunking process, its chunking every row and its loosing the context with previous row or aggregate the rows together. It would be good if you explain the chunking/embedding in more details. choosing the right chunking and embedding are very much important to get the accurate result. Keep doing the good work!!!
That's a good point. Yes the trick is on better preprocessing and embedding techniques. Will post a few videos in detail.
@@AIAnytime Much Appreciated!!!
@@AIAnytime that would be really interesting
@jeganbaskaran, did you come across any information that can make the chunking better?
Also, does the chunking always have to be equal sizes or can it be chunking every paragraph (big for small but if para is bigger than for e.g 500, then chunk it based on number of characters)
@@satyamgupta2182 We took 2 approaches. CSV even chunks its chunk in each row hence we frame each row as sentence by using Python code ex: firstname, last name, location, occupy with simple text first name last name person is staying in india and working in the software industry..... so we stared seeing better result. Another method add meta data information abt each column in the prompt and use a custom CSV loader and it started working descent.
Hi, Thank you very much for this informative video and using an open source llm.
Could you please explain that how the LLM is retrieving the data from CSV. Does it have the knowledge of whole data uploaded by the user?
Hi Maanya, thanks for your comment! The CSV or any other data that we have, embeddings are created for that and then we store those in a vector store/db (that's in a latent space). That becomes your knowledge base. When you ask a query, we use the same embedding model to get the vectors and then match that with the nearest vector chunks in the stored vector stores/db. Those matched vectors are passed to an LLM which then captures the semantic behind it and then generates an answer for you.
@@AIAnytime Thank you so much for the explanation.
Hi, thanks for the video, very informative. I was thinking, can you show how you can do this exactly same thing but using Pandas Ai together with Llama2 instead of OpenAI? Thanks again
Llama2 integration with Pandas AI is still not available.
sir can you tell us how we can make csv and excel file reader and also give predictions on any type of excel and csv file and after this we will able to extract the information in the form of pdf means value able pdf with information and save to local system if you said to do this project private i will pay to you also! all think look this video but with openai means with openapi key!
Thanks@@AIAnytime
Hey! thanks for sharing this tutorial. I tried it myself and run the code with the same dataset but I noticed that it's giving some incorrect answers, for example, the last question you asked about the country with the least GDP is in fact Somalia with 0, followed by Central Africa 0.026, the answer you got (Chad) is in the 21th place, and the answer I got was United Kingdom!. I also tried with other questions and I got very weird answers. Do you know why is this happening?? I also tried with the OpenAI API and I got the same wrong answers. Seems like it's just giving the correct answer when the question is very specific, like the one of the GDP of China.
same with me i thing for numerical data it is not working it just working when we search with country name if we also ask country having maximum GDP per capita that will not work because vector db will not support aggregation its just fetch similar context.
how to dowload and run the llama2 model that you copied in 7:38? I'm using Linux on windows 11 as WSL 2, plz help.
Can you watch my previous video here: czcams.com/video/kXuHxI5ZcG0/video.html I have explained that in detail.
@@AIAnytime sure, thanks.
Thank you for this video but i keep getting the error "You exceeded your current quota, please check your plan and billing details.... Why??
I m not sure why are you getting that error. Do you mind creating an issue on GitHub repo? I will look into it.
The streamlit environment is paid one, after you keep on uploading files at some point you get this message (happened when I was solving error and uploading the file again). We are technically using free version.
Hi this is an amazing project but when I ran the same i get an error message stating 'Your app is having trouble loading the streamlit_chat.streamlit_chat component.' and I'm not sure why. Can you please help ?
Have you installed Streamlit chat library?
My script will run and display no errors, bu nothing appears when I upload my csv file. Why might this be? Great video by the way. Thank you very much!
Can you let me know your laptop configuration? And also are you getting any errors? Can you check that in the terminal?
Thank you for this awesome tutorial but how i can make it understand another language and respond to this language example Greek or German or Hindi ?
Thanks for your comment. For that with similar LLMs, you need an extra layer of additional translation. If you are using GPT3.5/4, then you can achieve this by prompting itself without an extra translation layer. O
is there any option without download llama file?
You can pass the Huggingface repo name...
Thank you for the wonderful tutorial, I am getting this message "Number of tokens (1001) exceeded maximum context length (512)" after uploading .csv file and sending my query ? Could you please help me with that?
I am also getting same message
ModuleNotFoundError: No module named 'streamlit_chat' i get this error please help me
Do install pip install streamlit_chat
So what if I don't want to download the model and still reference it? Where do I need to deploy the model?
You can deploy on Hugging face itself and then use the inference endpoints. You can also deploy it on AWS using AWS jumpstart and then use it through a lambda function in your app.
@@AIAnytime Thank you. I'm learning lot from this channel. 😊😍
How can we show steps, like you showed in medical chatbot using chainlit
For that you need to use Chainlit....... You can do with Streamlit also. Use some loader/progress bar with the right set of labels.
I used your code as refrence and used llama 2 13B model for my CSVs, all the results I got were nonsensical. Then I used your exact code from github, it looked preetier but all the results were non-sense. Any guidance on how to improve accuracy?
Yes, even I got some answer completely not related to my csv. My csv has one columns as question and another as answer. Can someone help us out.
@@srishtibatra9991 if it's unrelated you can try to reduce the temperature to make it more strict, that usually helps
There is something called Pulled Requests on GitHub... The code would have been changed or improved by the community. Pls create an issue on GitHub repo of the same, I will try to debug.
@@huseyinsenol1769 I understand that you must have face some inconvenience but please try to be a little polite, he is taking a lot of efforts in making these videos so that we can learn.
I tried the same code and data file, but the results seems very inaccurate
me too. if i use text document instead of csv, the results are even worse. anyone can suggest how to overcome this.
@@malleswararaomaguluri6344 have you tried TAPAS, it is okish for smaller data but when it comes to large dataset, I don't see any open source model working well on csv dataset, there is another approach to use the python repl tool from langchain but that also comes handy with openAI.
@@malleswararaomaguluri6344 Same hear any update?
Also please make video on hosting this on Azure App service.....just a small 5 mins video
Soon... Can you watch other video where I have deployed apps on azure?
1. Is there anything extra we need to do for running on GPU?
2. If I deploy my project, if 10 users search simultaneously. How the usage will be? will each user take up the entire RAM on cloud. How should I do for 1000s of users?
You can run on the GPU if you have the CUDA setup already. Not much changes required!
I can notice few countries in example dataset having GDP less than Chad. For example row 124 and 130. So what is the best possible way to get correct answer in this case.
Hi Vijay,
This is a common problem when it comes to LLM retrieving information from Granular data. Complexity!! But factual checks can be helpful. With validation checks, etc. I am working on a few videos. Stay tuned 🙂
@@AIAnytime Thanks for the reply. I am eagerly waiting to see the solution in your next videos :) You are creating lots of good content on Gen AI so kudos to you.
I'm having problems trying out this on MacOs m1 chip :( it stays stuck after getting the query/question :(
Can you create an issue on the GitHub repo? Will try to debug.
Thanks for the response manage to fix !! I was having problems with my env sorry but thanks for the reply great content. @@AIAnytime
Thanks for the video. when compared chad, other countries has less gdp per capita. Model is not picking the right answer. how to overcome this. please make a similar video using gpu in windows.
Better chunking strategy. Defining schema before embeddings. Working on some videos, will post!
I create character ai bot prompts and have a great business idea. I just dont know how to build these back end or gui. Let me know if you are intetested. Ill dm you a sample of one of my bots or if you have a discord.
Sounds interesting, you can reach out to me on my WhatsApp or social channel. Plz find the link on the CZcams banner.
Where did you learn all this?
This is a very generic question. I work in an IT company so i learn from the work i do....
sir can you tell us how we can make csv and excel file reader and also give predictions on any type of excel and csv file and after this we will able to extract the information in the form of pdf means value able pdf with information and save to local system if you said to do this project private i will pay to you also! all think look this video but with openai means with openapi key! with also spech to text. I will "pay" for it!
Thanks
Why its taking too much time? pls guide
Can we do the same task with sql database?
Absolutely.... You can use Langchain SQL agent and tool to do the task.
@@AIAnytime yeah but is it possible with Llama only (Without OpenAI API key) ?
In my case returns wrong answer 😛
Model: Llama 2 7B Chat
Question: Which actor made the movie with worse rating?
Answer: Based on the information provided, Michael Johnson made the movie with the worse rating, Movie C in 2018 with a rating of 6.8.
Correct Answer: Isabella Garcia,Movie L,2022,6.5
Data:
actor,movie,year,rating
John Doe,Movie A,2010,8.5
Jane Smith,Movie B,2015,7.9
Michael Johnson,Movie C,2018,6.8
Emily Brown,Movie D,2012,9.0
Robert Lee,Movie E,2019,8.2
Sophia Kim,Movie F,2014,7.3
William Chen,Movie G,2016,8.7
Olivia Wong,Movie H,2020,7.5
James Rodriguez,Movie I,2011,8.9
Ava Martinez,Movie J,2017,7.1
Liam Ramirez,Movie K,2013,9.2
Isabella Garcia,Movie L,2022,6.5
Ethan Nguyen,Movie M,2009,8.0
Mia Ali,Movie N,2013,7.6
Alexander Wilson,Movie O,2016,8.4
Sofia Anderson,Movie P,2018,7.8
Daniel Thomas,Movie Q,2021,8.3
Camila Hernandez,Movie R,2014,7.2
Joseph Scott,Movie S,2015,8.1
Victoria Lopez,Movie T,2019,7.4
same all anwers I got where incorrect, even as simple as what the values in 1st row were random. Even attempted with a 13B version, no difference
is much difficult to run it on GPU (if you have the required hardware)?
No it's easy to run if you have a good enough GPU machine with decent VRAM. Let me know if you need help...
@@AIAnytime Hi, Could you help me in running it on a gpu, what are the changes we should make? ( GTX3080 )
@@AIAnytime yes! tell me how! i have 32gb ram, 12700k, 3080ti 12gb.
@@AIAnytime still weiting for your answer! it will be really appreciated.
@@AIAnytime still waiting for gpu code
Hi , i am getting correct answers from 2019.csv i asked what is the GDP of china? Answer : The GDP of china is not reported as it is not one of the countries listed in the context provided . IS there something i am doing wrong here
No you aren't doing anything wrong, Tushar. The model doesn't perform that well when it comes to complexity in the spreadsheets. Many faced the same issues. Maybe trying to put validation checks ?
Is the csv loaded to streamlit safe , thank you for your video amazing work
Yes, it is... Thanks for your comment.
why i could do not get the right answer?
Hi, can you let me know a bit in detail?
@@AIAnytime strange, why my reply disappear after some minutes. i have replied for twice.
i have put a issue on the right repo in the git.
It gave wrong answer on your second question.
Why can't you fine tune llama2 and use
Ofcourse we can fine tune Llama2 but I won't recommend fine tuning LLMs for information retrieval. For information retrieval, RAG + LLM is the way to go through. Use RAG with vector stores, and retrieve information. Fine tuning doesn't help much for this task. Yes, it works well for domain specific tasks like Coding problems, translation, etc.
@@AIAnytime like could you please tell like what is the difference between tuning and training of LLMs model. And why you not recommend to fine tune the model so that I'll get some knowledge on this
@@AIAnytime and if we use this procedure of storing data and letting Model to fetch through the data and give us and ans it will take more time and it will affect the speed na
Why are all the answers wrong from csv? not accurate
can you give me your Secrate API key
Too slow to be useable...nice idea.
Dropping a video tomorrow that is production ready.
Very informative, thanks for sharing.
When I try to run the code I get this error for model = "llama-2-7b-chat.ggmlv3.q8_0.bin": Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated... Do I miss something?
You might have been doing something wrong with model name. Have you downloaded the right model file from the bloke repository?
Even I'm also getting the same error, model name is correct. What could bethe issue? Please help. Thank you @@AIAnytime
sir can you tell us how we can make csv and excel file reader and also give predictions on any type of excel and csv file and after this we will able to extract the information in the form of pdf means value able pdf with information and save to local system if you said to do this project private i will pay to you also! all think look this video but with openai means with openapi key! with also spech to text. I will "pay" for it!
Thanks