I love where Langchain sits in the LLM ecosystem as a framework. I’m already numb at the stunning number of products that sit on top of Langchain which don’t really seem to distinguish from each other. Microsoft’s latest “flow” product (among others from the big platforms) are a direct threat to these smaller startups. I was on a zoom w AWS recently and they everything required to build an app using any LLM with Langchain (still in beta)
interesting.. intelligent self feeding retrieval/ splits.. well.. self guiding based on 'direction' on query. 'to know how to know' is superior to 'to know what to know'
When people learn something new, we take into consideration the source from whom we learned this new thing. We give more credence to some than to others. In other words, we have taste. When my astrologically inclined brother-in-law tells me something about science, I'm less likely to give his story total credence than if I heard it from Lex Fridman. When we do RAG, we don't rate our sources. The best we can do is ask the model to rate output. How can we safely start to rate sources and save the ratings?
One of the issues that I faced is that the self query retriever doesn't understand semantic meaning of the metadata variable we pass and it tries to match the similar word for it. For e.g., if we pass a date parameter in metadata as "26 August, 2023", it will not understand if your query has "August 26, 2023" in the input text and doesn't return any documents and just empty list. This seems like an issue to me as the user can ask questions with date in any format.
More importantly in a Pirsig/ Jordan Peterson - where Pirsig is beyond Peterson.. is can you get a 'ought from an 'is' ?' just because you can, does it mean you should ? Query to have option to set boundaries for self-guided feedback retrievals - not just for cost but ethically. Empiricist view of Science, useful maps based on limited hypotheses giving 'objective' firm results but yet with no 'direction' .. doesnt tell you where to go. oh yes.. relevancy across many aspects.. especially time base.. if streamed
The data confidentiality point has been solved by the cloud providers, you can get a private instance on Azure and no data will go to Microsoft or Open AI. Same goes for AWS and GCP. I don't get why this keeps being brought up a a benefit of local models. Chances are you'd run these local models on eg AWS Bedrock, same place as managed models (eg Claude)
I respectfully disagree, but only a little bit 😅- I'm seeing two main use cases where this remains an issue... Communications and utilities network data legally must be on prem (not talking OT infrastructure). The other is the defence industry, both government and private. Both areas are actually some of the most active industries in gen ai
I think it's more about APIs, you are sending your corporate data to e.g. Open AI and although it won't be used according to contract, it still will be stored on OpenAI for 30 days for "analysis purposes". The way things are now where you have a team of CIA agents deeply integrated into big tech (and OpenAI is almost owned my Microsoft now), it's not hard to imagine, that this analysis could be misused to gain more control over operations of a company, an industry and influence it's decision making process.
Some of us are very sensitive to audio quality -- particularly with voice. Even if you don't see much difference, investing in a high-quality mic and using a relatively acoustically dead room will improve the quality of these webinars dramatically (again, many people are completely insensitive to these matters, but probably ⅓ of people are not). I loved the content, but the amplitude spikes in Matthew's garbage headset were so bad I almost had to stop listening.
31:45 To say that something like a long term agent can’t narrow down it’s instructions permanently while having that underlying method of retrieval that you get from basically the “identity” that your trying to provide, it’s about finding that level of adaptation over time without completely changing the consistency of results.
The thing that would have made these talks more helpful if it was shown how it was being done rather than talking over it. This is what lacks in most of the langchain talks or events. And, the level of abstraction is dumbfound after a certain level.
Matthew, with the deepest respect. You say 'ah/um' A LOT. It may help to slow down and pick words your carefully. Unforturnately, it was too distracting to watch this video.
Awesome work from Mathew and the Unstructured team. Really looking forward to the new features.
I love where Langchain sits in the LLM ecosystem as a framework. I’m already numb at the stunning number of products that sit on top of Langchain which don’t really seem to distinguish from each other. Microsoft’s latest “flow” product (among others from the big platforms) are a direct threat to these smaller startups. I was on a zoom w AWS recently and they everything required to build an app using any LLM with Langchain (still in beta)
Great discussions. Really appreciate you all sharing your knowledge, experience, and opinions.
you guys are doing a really great and useful job, thanks for your precious time!
Harrison seems like a one man show..what a hustler
Savage
He's a total beast indeed. Always responding super fast to any questions and tweets + doing all of this stuff. Hats off!
You mean Anton right??
PRESENTERS:
- Harrison Chase - LangChain
- Matthew Robinson - Unstructured
- Anton Troynikov - Chroma
Thanks
interesting.. intelligent self feeding retrieval/ splits.. well.. self guiding based on 'direction' on query. 'to know how to know' is superior to 'to know what to know'
What software do you guys use for your webinars? The interface is very clean
What a great session !
When people learn something new, we take into consideration the source from whom we learned this new thing. We give more credence to some than to others. In other words, we have taste. When my astrologically inclined brother-in-law tells me something about science, I'm less likely to give his story total credence than if I heard it from Lex Fridman. When we do RAG, we don't rate our sources. The best we can do is ask the model to rate output. How can we safely start to rate sources and save the ratings?
Great talk - thank you
One of the issues that I faced is that the self query retriever doesn't understand semantic meaning of the metadata variable we pass and it tries to match the similar word for it. For e.g., if we pass a date parameter in metadata as "26 August, 2023", it will not understand if your query has "August 26, 2023" in the input text and doesn't return any documents and just empty list. This seems like an issue to me as the user can ask questions with date in any format.
Why is your new video marked as for youtube kids???? Can't even save it.
can we have the slides? Thanks for the amazing talk
Why are you concentrating on foundation models when local models have gotten so good?
31:30 💯
More importantly in a Pirsig/ Jordan Peterson - where Pirsig is beyond Peterson.. is can you get a 'ought from an 'is' ?' just because you can, does it mean you should ?
Query to have option to set boundaries for self-guided feedback retrievals - not just for cost but ethically. Empiricist view of Science, useful maps based on limited hypotheses giving 'objective' firm results but yet with no 'direction' .. doesnt tell you where to go.
oh yes.. relevancy across many aspects.. especially time base.. if streamed
The data confidentiality point has been solved by the cloud providers, you can get a private instance on Azure and no data will go to Microsoft or Open AI. Same goes for AWS and GCP. I don't get why this keeps being brought up a a benefit of local models. Chances are you'd run these local models on eg AWS Bedrock, same place as managed models (eg Claude)
the point is the freedom ..... Private Server ..... a lot o reason to not be slave for GAFA
It in no way has been "solved", merely addressed and not totally convincingly.
I respectfully disagree, but only a little bit 😅- I'm seeing two main use cases where this remains an issue... Communications and utilities network data legally must be on prem (not talking OT infrastructure). The other is the defence industry, both government and private.
Both areas are actually some of the most active industries in gen ai
But otherwise, agree that most businesses need to calm down and investigate before refusing .
I think it's more about APIs, you are sending your corporate data to e.g. Open AI and although it won't be used according to contract, it still will be stored on OpenAI for 30 days for "analysis purposes". The way things are now where you have a team of CIA agents deeply integrated into big tech (and OpenAI is almost owned my Microsoft now), it's not hard to imagine, that this analysis could be misused to gain more control over operations of a company, an industry and influence it's decision making process.
Some of us are very sensitive to audio quality -- particularly with voice. Even if you don't see much difference, investing in a high-quality mic and using a relatively acoustically dead room will improve the quality of these webinars dramatically (again, many people are completely insensitive to these matters, but probably ⅓ of people are not). I loved the content, but the amplitude spikes in Matthew's garbage headset were so bad I almost had to stop listening.
I write retrieval systems for agents can I work at chroma? I really need a job
I’m primarily working with Metadata Querying for my own purposes but I also have been working on semantic generation of metadata.
31:45 To say that something like a long term agent can’t narrow down it’s instructions permanently while having that underlying method of retrieval that you get from basically the “identity” that your trying to provide, it’s about finding that level of adaptation over time without completely changing the consistency of results.
Is everyone else using multiple self hosted vectordbs based on use case?
The thing that would have made these talks more helpful if it was shown how it was being done rather than talking over it. This is what lacks in most of the langchain talks or events. And, the level of abstraction is dumbfound after a certain level.
Matthew, with the deepest respect. You say 'ah/um' A LOT. It may help to slow down and pick words your carefully. Unforturnately, it was too distracting to watch this video.
35:15 @harrison found a fly 👀