LangChain - Parent-Document Retriever Deepdive with Custom PgVector Store

Sdílet
Vložit
  • čas přidán 21. 08. 2024

Komentáře • 27

  • @jarekmor
    @jarekmor Před měsícem +1

    hi! I like your videos and I learned a lot from it. Your approach is realy production-ready and I am implementing some of your ideas in my PoC for one of my customers.There will be more stuff in the PoC - MS Sharepoint On-Premise integration, AD and LDAP authorization, Neo4J, Multivectorstore ret., etc. But your ideas was the fundation for my project. Thank you very much and keep going! :-)

  • @varruktalalle4090
    @varruktalalle4090 Před 3 měsíci +2

    Can you explain how to reload the pg-parentDocRetriever e.g. first create the retriever as you showed and then reload the retriever in a different script?

  • @M10n8
    @M10n8 Před 3 měsíci +2

    This can be extended nicely over MultiVectorRetriever which nicely pair with 'unstructured' library, so you can make RAG over pdf files which unstructured would extract tables, images and text separately and ask model to make captions from images (base64 passed to openai), make summary from tables and if you like also text, then store that and retrieve using MultiVectorRetriever with PGVector as db ;-) Can I request video? ++

  • @maxlgemeinderat9202
    @maxlgemeinderat9202 Před 3 měsíci +2

    working exactly on this at the moment! My eval showed that ParentDocumentRetriever works best for my use case.
    What do you think of my idea of implementing a Reranker (e.g. ColBERT) after retrieving the small chunks and the only get the parent chunks of the reranked child chunks? Atm I am trying to implement this but I think I have to change the MultivectorRetriever class in Langchain. Or how would you add this to your solution (e.g. doing reranking with langchain CompressionRetriever)?
    I can't rerank the results in the end as normal, as the ParentChunks probably will be too large for a reranking model with 512 max_tokens

  • @thawab85
    @thawab85 Před 3 měsíci +1

    you had a few videos on Raptor, would be great if you can compare the indexing methods and what's the usecases each is recommended for.

  • @AngelWhite007
    @AngelWhite007 Před 3 měsíci +1

    Please make a video on creating a sidebar like Chatgpt using ReactJs and Langchain Python

    • @codingcrashcourses8533
      @codingcrashcourses8533  Před 3 měsíci +1

      Man this is 90 percent Front end work, you will find better people to build this

  • @Emmit-hv5pw
    @Emmit-hv5pw Před 3 měsíci +1

    Thanks !! Any plans of a tutorial on a custom agents with memory having custom tools to retrieve information from a SQL DB, vector store (pdf) and tool calling (real time info) with eval on LangSmith in a real business case environment.

    • @codingcrashcourses8533
      @codingcrashcourses8533  Před 3 měsíci +1

      Probably too difficult for a tutorial to do all that stuff at once. Maybe an easier usecase with RAG and Memory.

  • @angelmoreno3383
    @angelmoreno3383 Před měsícem +1

    That is a really interesting implementation! I wonder if this could help reducing time on the retriever.add_documents operation, as I'm trying to do a RAG with around 100 pdfs and when testing ParentDocument retriever this is delaying too much. Do you know any solution for this?

    • @codingcrashcourses8533
      @codingcrashcourses8533  Před měsícem +1

      Hm, how do you preprocess your pdfs? How many chunks do you have at the end?

    • @angelmoreno3383
      @angelmoreno3383 Před měsícem

      @@codingcrashcourses8533 On my vectorstore they are splitted on 800 chunk size. On my store im loading them using PyPDF loaders and a kv docstore

    • @angelmoreno3383
      @angelmoreno3383 Před měsícem

      @@codingcrashcourses8533 im using PyPDF loader and then storing them on a LocalFileStore using create_kv_docstore. At the end my docstore has around 350 chunks

    • @awakenwithoutcoffee
      @awakenwithoutcoffee Před 23 dny

      @op , did you a solution ?

    • @angelmoreno3383
      @angelmoreno3383 Před 22 dny

      @@codingcrashcourses8533 I did a custom preprocess splitting on fixed chunk size (i think it was around 400) and then adding some metadata to each pdf

  • @andreypetrunin5702
    @andreypetrunin5702 Před 3 měsíci

    Markus, hi. Can you give me the code to this video? I want to convert it to the Xata database.

    • @codingcrashcourses8533
      @codingcrashcourses8533  Před 3 měsíci

      I added the notebook

    • @andreypetrunin5702
      @andreypetrunin5702 Před 3 měsíci

      @@codingcrashcourses8533 Спасибо!!!!

    • @andreypetrunin5702
      @andreypetrunin5702 Před měsícem

      @@codingcrashcourses8533 The code only creates and saves the database, but how do I load it when I reuse it? If I didn't see it, I apologize.

    • @codingcrashcourses8533
      @codingcrashcourses8533  Před měsícem

      @@andreypetrunin5702 You don´t have to "reload" it when you use PgVector, this service is permanently running inside a container. The "get_relevant_documents" method already uses it

    • @andreypetrunin5702
      @andreypetrunin5702 Před měsícem

      @@codingcrashcourses8533 confused with the local FAISS and Croma databases. ))))

  • @yazanrisheh5127
    @yazanrisheh5127 Před 3 měsíci +1

    First