Hi James, I have been following your channel for quite a while, great content! I would be very interested in comparisons of Claude 3 vs GPT4 for complex RAG applications. Also, try harder questions and more complex tasks. I think a lot of langchain developers will agree with me;)
They've just released Haiku a few hours ago, it's the fastest and cheapest model. Probably it'll be the way to go if you're not self hosting your LLM. Unfortunately, I couldn't make it work with Langchain yet, I guess the API has not yet been updated to use Claude 3
I got sonnet and opus working okay, it needed `pip install langchain-anthropic==0.1.4` - not sure if it is different again for haiku though - I haven't tried it yet
@@jamesbriggs the current project I'm working on is using langchain JS, just checked again and it's not working. I'll try testing it with Python as you did in the video. Thank you
Hey James, great video! How does this config (Claude-voyage-pinecone) compare to other RAG pipelines you've gone through (like GPT-3.5-Turbo-ada (or text-embedding-3)-pinecone? Is it possible to enable a reranker in this like Cohere? Will that make this better, especially for large datasets?
Great Video I do have a question? For Well structured API Data what type of RAG would bes best for LLM to retrun natural responses infering structured say hypothetically if it was API "my-health": {week1[dataPoint1:0.1]...} of BPM or heart rate and oxygen level data over time say a week and a person ask the LLM over the past week have there been any improvemnets in my health. (lets assume this persons fit bit stores daily walking data). When the question is asked, the API is called, data is emded and stored in memroy or in pinecone and we would get an expected reply of "Yes James it looks like your activity over the past week has increased and current oxygen level have increased therefore you are improving..." For somthing like this which would be the best RAG method to deploy to get the most natural response in real-time?
can you pls clarify when you say "it works pretty well?" How is the performance different than the other RAG demos you have? Why is it worth implementing?
I need to use it more, but from what I see so far, Opus seems to answer correctly (assuming it gets the right info) 100% of the time and pulls in connections from different contexts very well. The answers are detailed and coherent, which is nice. However, it is VERY slow, so if you have a RAG application where having a wrong answer is very bad and the response time is not too important, this seems like a good option, otherwise, you probably should use a faster model
Hii James i was tryna build a RAG app with nearly 40 million tokens but when trying to create embeddings and storing them in the vector db it breaks and gives error of inconsistent data i ran multiple test and all those tests passed, and i don't know what to do now !! plz make a video on how to make rag apps for large datasets !!
yeah there's no requirement on specific embedding models, the LLMs and embedding models in RAG are independent of each other so you can mix and match as you prefer
You do some good work, but why do you always use so many tools? Are you paid by these companies. I feel like you could get save results by removing half that
I've never done a sponsored video, other than technically Pinecone (as I work there) - I just want to show people that they can use tools/libraries beyond OpenAI and get similar, sometimes better results
📌 Link to the code we work through:
github.com/pinecone-io/examples/blob/master/learn/generation/langchain/v1/claude-3-agent.ipynb
Hi James, I have been following your channel for quite a while, great content!
I would be very interested in comparisons of Claude 3 vs GPT4 for complex RAG applications. Also, try harder questions and more complex tasks. I think a lot of langchain developers will agree with me;)
They've just released Haiku a few hours ago, it's the fastest and cheapest model. Probably it'll be the way to go if you're not self hosting your LLM. Unfortunately, I couldn't make it work with Langchain yet, I guess the API has not yet been updated to use Claude 3
I got sonnet and opus working okay, it needed `pip install langchain-anthropic==0.1.4` - not sure if it is different again for haiku though - I haven't tried it yet
@@jamesbriggs the current project I'm working on is using langchain JS, just checked again and it's not working. I'll try testing it with Python as you did in the video. Thank you
Thank you- I will try this workflow with Haiku :)
awesome, I will also try it soon!
James what is current SOTA for Open Model RAG workflows?
Hey James, great video! How does this config (Claude-voyage-pinecone) compare to other RAG pipelines you've gone through (like GPT-3.5-Turbo-ada (or text-embedding-3)-pinecone? Is it possible to enable a reranker in this like Cohere? Will that make this better, especially for large datasets?
Great Video I do have a question? For Well structured API Data what type of RAG would bes best for LLM to retrun natural responses infering structured say hypothetically if it was API "my-health": {week1[dataPoint1:0.1]...} of BPM or heart rate and oxygen level data over time say a week and a person ask the LLM over the past week have there been any improvemnets in my health. (lets assume this persons fit bit stores daily walking data). When the question is asked, the API is called, data is emded and stored in memroy or in pinecone and we would get an expected reply of "Yes James it looks like your activity over the past week has increased and current oxygen level have increased therefore you are improving..." For somthing like this which would be the best RAG method to deploy to get the most natural response in real-time?
Where can you add the system prompt I didn't see anywhere in your colab where you added that in.
Hey James
Can you help me with a video of Rag with quantised llm model and making the vectors without pinecone.
can you pls clarify when you say "it works pretty well?" How is the performance different than the other RAG demos you have? Why is it worth implementing?
I need to use it more, but from what I see so far, Opus seems to answer correctly (assuming it gets the right info) 100% of the time and pulls in connections from different contexts very well. The answers are detailed and coherent, which is nice.
However, it is VERY slow, so if you have a RAG application where having a wrong answer is very bad and the response time is not too important, this seems like a good option, otherwise, you probably should use a faster model
Thank you!! @@jamesbriggs
I have a very generic question about evaluation of the RAG system. How can we evaluate the responses generated by the RAG system?
Check RAGAS metrics
Hi james can you do a video on a complete end to end projects using pinecone canopy lanchain with an open source llm model😅😅
Thanks
Thanks!
you're welcome!
Hi the pinecone suffers not found error even after installing pinecone- client anyone else found solution?
Hii James i was tryna build a RAG app with nearly 40 million tokens but when trying to create embeddings and storing them in the vector db
it breaks and gives error of inconsistent data i ran multiple test and all those tests passed, and i don't know what to do now !!
plz make a video on how to make rag apps for large datasets !!
is the dataset public?
@@jamesbriggs Yes i got a take home project with more than 12k html doc's how can i share the link !!
@@Aditya_khedekar I think you can share on here in plaintext like "huggingface dot com slash ai-arxiv2" (for example)
@@jamesbriggs hii james i have sent linkedin connection with a note to the assignment and data set !!
Can you use the athropic models with other embedding models like maybe bge?
yeah there's no requirement on specific embedding models, the LLMs and embedding models in RAG are independent of each other so you can mix and match as you prefer
Does RAG works with Tabular Data? I tried using csv data but it does not work well
Better to ask an LLM to summarise the table then use the summary to create your embedding but then store the table (in markdown format) as your text
You do some good work, but why do you always use so many tools? Are you paid by these companies. I feel like you could get save results by removing half that
I've never done a sponsored video, other than technically Pinecone (as I work there) - I just want to show people that they can use tools/libraries beyond OpenAI and get similar, sometimes better results