Build and Run a Medical Chatbot using Llama 2 on CPU Machine: All Open Source
Vložit
- čas přidán 22. 07. 2023
- In this tutorial video, Ill show you how to build a sophisticated Medical Chatbot using powerful open-source technologies. Learn how to use Sentence Transformers for embeddings, Faiss CPU for vector storage, and integrate Llama 2, a large language model, using the Chainlit library for a conversational interface. Follow along, as we guide you step-by-step to create an intelligent and efficient Medical Chatbot, all while using freely accessible tools. No prior experience required - dive into the world of conversational AI and healthcare innovation today! 🤖💬
Llama 2 Model (Quantized one by the Bloke): huggingface.co/TheBloke/Llama...
Llama 2 HF Model (Original One): huggingface.co/meta-llama
Chainlit docs: github.com/Chainlit/chainlit
Faiss GitHub: github.com/facebookresearch/f...
AI Anytime: github.com/AIAnytime
Langchain Docs: python.langchain.com/docs/get...
Sentence Transformers Hugging Face: huggingface.co/sentence-trans...
CTransformers GitHub: github.com/marella/ctransformers
LLM Playlist: • Large Language Models
WhatsApp Group:
#ai #generativeai #llama - Věda a technologie
Hey Sonu - this is one of first YT tutorials with a thorough explanation I've seen in a while. I got this running the first time 'out of the box' ; it did ask me to pip install ctransformers, but after that it came up just fine. I am going to experiment with other documents. Some people don't like to sit through writing code, but it's good for us! Especially when you mention other tools we could try and why you picked what you use. Excellent!
What a fantastic video! Probably the only one that goes into complete details!
Glad you liked it!
Thank you man so much! I am very grateful for your content. I appreciate your passion for open source ai and your teachings are helping bring this technology into my reach. I was so happy when this ran! :) Excited to see your future videos.
Glad you like them! I have many videos already. More coming soon. Pls stay tuned 🙏
you did semantic search and no finetuning involved . is this accurate ?
Wow. You packed a lot here - very helpful, thanks.
Glad it was helpful! Thank you 🙏
Hey, somehow ended up on this extremely underrated channel and I gotta say I love it!, I loved each and every part of this tutorial, something I was looking for quite some days now. Thank you so much, deffo subscribing and looking forward for such content.
Regards,
Anas from Pakistan
Loved the comment. Thanks
Thanks!!! Great presentation, super useful, amazing that you had the energy to do this while sick : )
Glad you enjoyed it!
Thank you for a smart and precise explanation of such a difficult topic
Just wanted to drop in and say congrats on your CZcams tutorial! 🎉🎥
Seriously, I'm so impressed with your content! Keep up the fantastic work!
Best wishes,
Rafael from Belgium
Hi Rafael, thanks for your lovely comment. Let's connect if you feel like..... Best, Sonu!!
Amazing video, you have saved the time of a lot of people. Keep up the excellent work.
Glad it helped... plz look at the LLM playlist.
Great video! thank you for sharing your expertise. Keep up the good work!
You are incredible professor!!! Thank you so much for your tutorial, i got very good insights. Best regards for you
Loved the content, it was beautifully explained. Thank you :)
Glad it helped!
Thanks for the great tutorial. It is really helpful.
A hint for anyone stuck with some errors in model.py, here are some fixes (original -> fix):
chain = cl.user_session.set("chain")-> chain = cl.user_session.get("chain")
res = await chain.acall(message, callables=[cb]) -> res = await chain.acall(message.content, callbacks=[cb])
thanks man
thank you,
another addition - under #QA model function, change -> db = FAISS.load_local(DB_FAISS_PATH, embeddings) ------> db = FAISS.load_local(DB_FAISS_PATH, embeddings,allow_dangerous_deserialization=True)
The best channel for LLMs.. thanks
Well done and appreciate the efforts. You have made my weekend interesting !
Glad to hear that! Please subscribe and check out the other videos too.
@@AIAnytime Sure, Thanks. Got struck at "could not reach server". 😞
Simply amazing ! this video can help a lot to,who wants to start working with LlaMa 2. thanks for sharing this.
Glad it was helpful! Please consider subscribing if you like other videos as well.
Amazing video thank you - I wanted to build similar chatbot based on open source model - now it will be easer to do it.
Thank you for your comment! As I am new on YT, your support can help me grow and creat more such videos.
May be you are new but not for long time. Sooner your such videos are going to Rock@@AIAnytime
Many thanks for a great video. Fantastic tutorial!
Glad it was helpful!
Thank You For create this video .This video was relly help full 😃
Nice video and great learning. Liked your confidence and knowledge. Going to build this bot on over the weekend and hopefully should be a breeze by looking into your code base and video.
Glad it was helpful! Thanks.
bro what is your pc specs? and plz tell minimum system requirements for deplying llama on a computer
@@AIAnytime
@@AIAnytime how to make it run on gpu too ??
Wow... This is what I was looking for 😇
Thanks, It's very useful. Upload more videos like that
Thanks for your comment! Please check my LLM playlist.
Great work, this video was really informative.
Glad it was helpful!
Thank you for the video and I learn a lot from you.
Glad to hear that!
What an amazing video... Thank you.
Fantastic video! A 1080p quality video would make the watching/learning experience much better. Just a candid suggestion.
Thanks for the tip! My recent videos have improved. Share your feedback on those if you have any.
Wow thanks for this video... Really helpful
Glad it was helpful!
Great tutorial am looking to learning this skills as soon to take new role
You can do it! Best of luck.....
Thank you very much sir amazing video, very knowledgeable amazing teaching ❤
Thanks and welcome
Excellent!!!
Good job, thank you
Thank you for your detailed explanation. Your classes are quite interesting and are building confidence to move further forward. I need some suggestions: I saw a medical chatbot using Llama 2 on a CPU machine, which was all open source. Similarly, I need to build an image-to-text multimodal model on a CPU using all open-source tools. Please provide your suggestions.
Awsome content! When is it adecuate to fine tune an llm instead of working or as a complement for the botpress knowledge base?
Good one. Fantastic
Glad you liked it
Outstanding video... Thank you
Glad you enjoyed it!
Amazing Video!
Glad you enjoyed it
awesome channel man..more power!
Thanks for the visit!
Thanks
Amazing video thank you,
I had a question.
1. Unable to retrieve the answers for the question for the content out of the pdf, if we want to get a ans from pdf if not found then from pretrained model. how to configure it.
This was a really well put together Tutorial thank you so much. Just one question what all needs to change to run this on GPU instead of CPU. Thank you so much for your time. Keep up the awesome work!!!!
Pick a GPU LLM model from the bloke instead of CPU model. Usually GPU models have GPTQ in their name
Quality content, thank you very much
Very welcome
Very Helpful
Glad it helped
Hi Sir, thank you so much for the tutorial. Do you know how to enable GPU support for this model ?
Very Good Video
This is amazing Thank you
Glad you like it!
Thanks, Open source AI Advocate
good work. keep it up
Thanks a lot. It really very helpfu.
Glad it was helpful!
Thank you for the efforts to explain this in very simple way.
Am new to LLM's. Tried your GitHub code, When am asking the question it gives the error "Async generation not implemented for this LLM." Could you please help with a workaround.
Appreciate the great work!
Most of the tutorials out there are just trying these LLMs on Colab notebooks, makes you eager for more.
Would appreciate if you can also cover the deployment part, thank you :)
Glad you like them! There are a few deployment videos on my channel. Please check out.
Really, it was a wonderful video!! Can I train this model in Google Colab or any other cloud GPU's??
Great content thank you!
Glad you liked it!
Hi sir, thank you so much for the video we are looking for the same type of video. ( I have one request- can you please make a video for data extraction from different types of invoice data with the help of open source model or libraries.)
Great video
Thanks for the visit
Great job dude non technical person can also understand your explaination. Thanks n respect for sharing the open source Ai. I have one question how i can restrict this chatbot not to answer any question outside of the document/PDF. For example if i ask chatbot what is python then it is giving the answer but this information is not present in PDF. How i can restrict it and make it only PDF specific bot?
Amazing Content
Thank you.
Quality!
Good video!
Glad you enjoyed it
super work, thx
Thank you !
Hey thanks for this, how would one separate the model from chainlit UI? i.e separating concerns and running in two containers if possible.
Very helpful bro
Glad it helped
thanks for sharing
Thanks for watching!
Thank you so much for the video genuinely I learned something from this 1 hour , Just one question for GPU we have to change just cpu to gpu or any other package to be updated. Once again great video
Not much of change, use the CUDA kernels instead CPU. Couple of changes ofcourse. You can also use the original model for better performance.
Hello from Portugal! Thanks for your video, Sir. Could you make a follow up video on how to run it on GPU? As you see there are many viewers interested on it. Being a non-programmer, it would be nice to see a video showing what and where to change on the code. I was able to follow this video and make it work eventhough i don't know coding at all, so i believe you would generate a great video for GPU usage too. Maybe something like a follow up video. Thanks, Sir!
Hi Pedro, thanks for your lovely comment! I will create a video soon for the GPU as well. Stay tuned....
@@AIAnytime thanks. Looking forward to it
Very good content, it is very helpful
on a Mac these developments cannot be run due to the GPUs. However, I understand that in Google Colab it could be carried out, right?
Have you made a follow-on video showing how to incorporate GPU acceleration (CUDA for Nvdia) into your codebase?
Beautifully explained each step, Would you like to confirm, what GPU is best for llama-2 (7B and 13B) model on PC/Laptop.
Get anything which has 24GB VRAM if that's in your budget.
Subscribed bro
Good work, How to get a stream response like chatGpt and output the stream word by word as soon as we get. If possible reply with a code example of current video.
Hello! I am curious... Do you know if Could I use ChatOpenAI class for Llama2.cpp deployed on another server? I have OpenAI working already but Llama2 it is impossible without using huggingfaces or loading the model path (I dont have access...)
Hi great work thank you 🎉
Thankyou.
legend!
I wonder how you could get it so that you type in a bunch of symptoms and it asks follow up and then gives you possible diagnosis.
POWERFUL!!!!
Thanks
For some reason for me from the second / third question in a session - i am getting all misconstructed answers with lot of word repetitions and in the terminal window - I can see "Number of tokens (638) exceeded maximum context length (512)." ,... Any idea what it is and how to prevent this ? I can try increasing the max_token, but first want to better understand the issue
Amazing! Can you explain the problems with Langchain in production and provide alternatives for Langchain?
Fantastic questions. Let me answer it..... Harrison chase and team has done a great job with Langchain but atm they aren't enterprise ready:
1. It's an arbitrary code execution. Prone to prompt injections.
2. Edge cases issues are identified with integrations.
3. High compute costs due to CPU and memory spikes.
4. Many other vulnerabilities
Let's give Harrison some time on it.
Many other developments are happening. Stay tuned.
@AIAnytime perfect one! As we saw responses took upto 2 mins. LLMChain primarily consumed all the time. How we can further tune it to speed up while staying on cpu (with respect to both hardware specs or some parameter config/code tuning that can speed up the replies). And what's a good hardware config to run this solution to get chatgpt like responses.
Thanks!
Welcome! Thank you for the support.
GENIUS
Thanks
This video is too good, thank you. I have two questions - one, sometimes the answers are very short, is there a way to get longer, more friendly communication? The second question is, can you use chainlit on huggingface spaces, or do you have to stick to streamlit? Thank you so much!
Thanks for your comment. Increase the max tokens length. Maybe 1048. I will recommend to deploy this on AWS or Azure. Is there anything specific you want to deploy on HF spaces? I don't think it currently automatically supports, yes there are workarounds.
Great video, I have a question for you, what model can I use to do it in Spanish, or does it work with the same one?
It would be nice to see how we could actually stream the responses. Also, that quantized version you are using is old, the new quantized versions that have a "K" are better.
how do you know did you use it?
hii. so while i am on venv and trying to install the requirements.txt, facing issues for installing torch in mac m2 air. Is there any solution for the same?
Very Nice .
Thank you! Cheers!
Nice video! Just one minor question: why not use LlaMA's own encoder layer for the embeddings?
what's LlaMA's own encoder embedding layer?
Hey! Thanks for the video. I am running it on a Macbook Pro with M2 chip but it is taking ages for even a single response to come in. Any suggestions?
Thank you for the tutorial, So how to get streaming chat response?
great tutorial. could you please tell, how to fine tune the model? is it possible?
Amazing video.
Is it possible to add the translation feature to the response using llm model?.if it is possible, can you tell me how to do it.
Amazing! This pipeline doesn't work well with CSV files though. Could you make a video explaining how to use csvs with these open-source models?
Great suggestion! Will come up with something...
@@AIAnytimeCould you please suggest videos or websites I could use to create a csv chatbot using llama?
Hi nysa, find this: czcams.com/video/MUADZ97GgZA/video.html
In the video at 4:39 you have run a command in the terminal. Where should we run it in the windows Operating system. How to do that please give an explanation.
But does this also work without network - I could see you were having issues with the server one time in the video
hello i want to use this model epfl-llm/meditron-7b instead of llama but what should i put for the model_type ? thx
Thank you very much for this great tutorial.
I have an error and I am struggling to solve, maybe could you help me?
The error is:
ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'")
and I have already tried to uninstall, downgrade, and upgrade versions of faiss-cpu but it does not solve this issue.
Hi @AI anytime, thanks for creating this video. It's very informative and simple to understand. I have a simple question, can I create a chatbot like this which could display images as well with the related text. I know there is GPT 4 which is capable of it but I want to know whether it is possible with LLama or any other models?
I am no expert, but I have been looking into the same thing for quite some time and the simple answer is no. But the thing is the larger LLaMA2 models like the 70b parameters one is very capable. It beats GPT-3.5 turbo in benchmarks but fall shy of GPT-4. And well currently PaLM 2 does kinda equal GPT-4 in performance but it is a little bit worse (my personal experience). But the main advantage of such models (LLaMA 2) is that, well obviously they are free, and you can finetune them for your own use case making it more efficient than models that do better generally (like GPT-4). This is especially true for easier use cases like this chatbot, which would not even utilize the entire power of GPT-3 so we wouldn't even consider using the much much costlier GPT-4. I hope you're getting my point. For complex tasks, GPT-4 may be better but reaching the potential of such huge models for most use cases is rare. And remember how mind blowing Chat GPT was just above a year ago when it was released. LLaMA is better than that.
Hi, Thank you for great tutorial. I did follow with you along but I am facing issue that my Llama chat model did answer based on document but it answers generic questions like speed of light and other questions too, Is there anyway that I could get answer only based on documents ?
If I ask any questions that is completely unrelated to the data given in PDF, it is giving some random answers from the pdf. How to handle that error and make it give a response like "sorry couldn't able to answer the question"?
I tried the same with Instructor embeddings and llama2 model that you prescribed. I am continuously getting this error,
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.7\\bin'
Any idea why? Wasnt it supposed to run on CPU only?
Thank you for this, but can u make a similar bot which not only gives response with text but rich media(like images,gif,links) etc. Just like how u create embeddings on the text can u do embedding on images in pdf.Would love to see ur video on this
I am getting an error name 'custom_prompt_template' is not defined in chainlit interface. I have no idea why?