Postgres pgvector Extension - Vector Database with PostgreSQL / Langchain Integration

Sdílet
Vložit
  • čas přidán 25. 06. 2024
  • Blog Post: bugbytes.io/posts/vector-data...
    In this video, we'll look at the pgvector extension for PostgreSQL, that allows you to turn your Postgres database into a vector data-store!
    pgvector adds the vector data-type and distance computation operators (L2, inner product, and cosine distance) to allow you to query for "similar" items in the vector-space.
    We'll see how to set pgvector up in a Docker container, and will see how to integrate it with Langchain via the PGVector object.
    We'll look at how to take a piece of text, split it into chunks, create embeddings from those chunks using OpenAI, and then store the embeddings in the Postgres vector database. We'll also see how to query the database for vectors/documents that are similar to a text prompt/query.
    ☕️ 𝗕𝘂𝘆 𝗺𝗲 𝗮 𝗰𝗼𝗳𝗳𝗲𝗲:
    To support the channel and encourage new videos, please consider buying me a coffee here:
    ko-fi.com/bugbytes
    📌 𝗖𝗵𝗮𝗽𝘁𝗲𝗿𝘀:
    00:00 Intro
    00:41 Introduction to pgvector for PostgreSQL
    03:23 Splitting text file into chunks with Langchain RecursiveCharacterTextSplitter
    06:10 Using OpenAI to get embeddings for each chunk with OpenAIEmbeddings object
    10:54 Setting up pgvector and PostgreSQL in a Docker container
    16:38 Using the Langchain PGVector object to connect to PostgreSQL
    21:47 Finding similar vectors to a query in pgvector
    25:29 Querying pgvector with SQL to get cosine distances
    𝗦𝗼𝗰𝗶𝗮𝗹 𝗠𝗲𝗱𝗶𝗮:
    📖 Blog: bugbytes.io/posts/vector-data...
    👾 Github: github.com/bugbytes-io/
    🐦 Twitter: / bugbytesio
    📚 𝗙𝘂𝗿𝘁𝗵𝗲𝗿 𝗿𝗲𝗮𝗱𝗶𝗻𝗴 𝗮𝗻𝗱 𝗶𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻:
    Blog Post: bugbytes.io/posts/vector-data...
    pgvector: github.com/pgvector/pgvector
    pgvector DockerHub image: hub.docker.com/r/ankane/pgvector
    State of the Union text: github.com/hwchase17/chroma-l...
    OpenAI Embeddings: platform.openai.com/docs/guid...
    Langchain Vectorstores: python.langchain.com/docs/mod...
    #python #langchain #datascience #postgresql

Komentáře • 120

  • @kevon217
    @kevon217 Před 8 měsíci

    Very thorough walkthrough. Thanks!

  • @wadejohnson4542
    @wadejohnson4542 Před 6 měsíci +3

    Most excellent. I am now a monthly supporter. You deserve to be paid.

  • @shinchima
    @shinchima Před 6 měsíci

    Brilliant content. Concise, no waffle. Thank you

  • @Andromeda26_
    @Andromeda26_ Před 8 měsíci

    Thank you so much for sharing the details, Your informative CZcams videos have been incredibly helpful. Great job on putting together such valuable content! Keep up the outstanding work and continue enlightening us. We truly appreciate your contributions!

    • @bugbytes3923
      @bugbytes3923  Před 8 měsíci

      Thanks a lot, glad to hear that the videos have been helpful - thanks for watching and supporting the channel!

  • @silkogelman
    @silkogelman Před 11 měsíci +3

    Just yesterday I thought "pgvector would be interesting to see a video about".
    And then you publish this! 👏👏👏🥳
    Thank you Lyle. 🙏

  • @tejasvinnarayan2887
    @tejasvinnarayan2887 Před 3 měsíci +1

    Extremely complex concepts published in the simplest way! I could run the whole notebook typed without errors! Thank you for the clarity!

    • @bugbytes3923
      @bugbytes3923  Před 3 měsíci

      Thanks a lot, really happy to hear that! Cheers!

  • @australianman8566
    @australianman8566 Před 11 měsíci

    Dude thanks for making this. I always learn something from your videos. Thank you!

    • @bugbytes3923
      @bugbytes3923  Před 11 měsíci

      Thanks a lot, glad to hear that! Thank you for the support!

  • @sqlsql4304
    @sqlsql4304 Před 8 měsíci +1

    Really appreciate your efforts you have put in for this tutorial

  • @mattiassoderberg3394
    @mattiassoderberg3394 Před 8 měsíci +1

    Fantastic comprehensive walkthrough of how to use PGVector and Python to work with vectors for your AI stuff 😀

    • @bugbytes3923
      @bugbytes3923  Před 8 měsíci +1

      Thanks a lot Mattias!

    • @mattiassoderberg3394
      @mattiassoderberg3394 Před 8 měsíci

      @@bugbytes3923thank YOU, now looking into the one where you use Django as the front end to all of this 😊

  • @LearningWorldChatGPT
    @LearningWorldChatGPT Před 10 měsíci +2

    What a fantastic video.
    Thank you, BugBytes !

  • @alexandredamiao1365
    @alexandredamiao1365 Před 4 měsíci

    Thank you so much for this tutorial! Very, very high quality!

  • @teddyperera8531
    @teddyperera8531 Před měsícem +1

    clear and well structured. you have an amazing style of teaching.

  • @Mankind5490
    @Mankind5490 Před 4 měsíci +2

    Straight forward explanation. Thank you

  • @fabsync
    @fabsync Před měsícem

    oh man.. it has been a while and it is still the best tutorial out there.. It will be great to see something with pgvector again with django-ninja...

    • @bugbytes3923
      @bugbytes3923  Před měsícem

      Thanks a lot! I'd love to do some more on PGVector - if anyone has any project ideas, let me know here!

  • @theagainagain
    @theagainagain Před 8 měsíci +1

    This was super helpful, thanks!

    • @bugbytes3923
      @bugbytes3923  Před 8 měsíci

      Glad to hear that - thanks for watching!

  • @arturgomes1654
    @arturgomes1654 Před 6 měsíci +1

    thank you so much for this content!

  • @Kingromstar
    @Kingromstar Před 7 měsíci

    thanks for this, it was a great help!

    • @bugbytes3923
      @bugbytes3923  Před 7 měsíci

      Glad to hear that! Thank you for watching.

  • @AA-xz1ut
    @AA-xz1ut Před 10 měsíci +1

    fantastic content, thank you! would be great if you could do a more in depth video on how do indexing (HNSW) with the same jupyter notebook example

  • @rishu4225
    @rishu4225 Před 6 měsíci

    Thanks man, Great content!

  • @johnallen9992
    @johnallen9992 Před 8 měsíci

    powerful libs - yes its almost as if AI 'needs' a highly artistic oracle to 'shape' it's 'stance' in order to focus on the goals/need of the User/app

  • @victoratui2445
    @victoratui2445 Před 9 dny

    Great job! Extremely usefull ! tks.

  • @dmitrymikhailovnicepianomu8688
    @dmitrymikhailovnicepianomu8688 Před 10 měsíci +1

    Very interesting!

  • @ravikumarhaligode2949

    Great Video Sir

  • @joventan4303
    @joventan4303 Před 3 měsíci

    Thanks this is very helpful

  • @swiftmindai
    @swiftmindai Před 5 měsíci

    Good contents. Thanks.

    • @bugbytes3923
      @bugbytes3923  Před 5 měsíci

      Thanks a lot!

    • @duongkhang4051
      @duongkhang4051 Před 5 měsíci

      i am having this error, pls help me how to solve this
      Could not open extension control file "/PostgreSQL/16/share/extension/vector.control": No such file or directory.extension "vector" is not available.

  • @shaunpx1
    @shaunpx1 Před 8 měsíci

    loving your videos man, thank you for clear concise explanation of these topics. Do have any videos using RAG and agents in Django? I am using Django RestAPI and have been struggling with an agent controller that work fine in the notebook but then times out in my API request with the exact same code usin Char ReAct Description?

  • @octavianreksa7994
    @octavianreksa7994 Před 10 měsíci

    thanks. really helpful

    • @bugbytes3923
      @bugbytes3923  Před 10 měsíci

      Thanks for watching!

    • @octavianreksa7994
      @octavianreksa7994 Před 10 měsíci

      @@bugbytes3923 Hey I have this error. do you know why?
      connection_string = "postgresql+psycopg2://user:pass@localhost:5432/db"
      collection_name = 'financial_qa'
      db = PGVector.from_documents(
      embedding=instructor_embeddings,
      documents=texts,
      collection_name=collection_name,
      connection_string=connection_string
      )
      File ~\.conda\envs\financial_qa\lib\site-packages\langchain\vectorstores\pgvector.py:578, in PGVector.from_documents(cls, documents, embedding, collection_name, distance_strategy, ids, pre_delete_collection, **kwargs)
      574 connection_string = cls.get_connection_string(kwargs)
      576 kwargs["connection_string"] = connection_string
      --> 578 return cls.from_texts(
      579 texts=texts,
      580 pre_delete_collection=pre_delete_collection,
      581 embedding=embedding,
      582 distance_strategy=distance_strategy,
      583 metadatas=metadatas,
      584 ids=ids,
      585 collection_name=collection_name,
      586 **kwargs,
      587 )
      File ~\.conda\envs\financial_qa\lib\site-packages\langchain\vectorstores\pgvector.py:453, in PGVector.from_texts(cls, texts, embedding, metadatas, collection_name, distance_strategy, ids, pre_delete_collection, **kwargs)
      445 """
      446 Return VectorStore initialized from texts and embeddings.
      447 Postgres connection string is required
      448 "Either pass it as a parameter
      449 or set the PGVECTOR_CONNECTION_STRING environment variable.
      450 """
      451 embeddings = embedding.embed_documents(list(texts))
      --> 453 return cls.__from(
      454 texts,
      455 embeddings,
      456 embedding,
      457 metadatas=metadatas,
      458 ids=ids,
      459 collection_name=collection_name,
      460 distance_strategy=distance_strategy,
      461 pre_delete_collection=pre_delete_collection,
      462 **kwargs,
      463 )
      File ~\.conda\envs\financial_qa\lib\site-packages\langchain\vectorstores\pgvector.py:213, in PGVector.__from(cls, texts, embeddings, embedding, metadatas, ids, collection_name, distance_strategy, pre_delete_collection, **kwargs)
      210 metadatas = [{} for _ in texts]
      211 connection_string = cls.get_connection_string(kwargs)
      --> 213 store = cls(
      214 connection_string=connection_string,
      215 collection_name=collection_name,
      216 embedding_function=embedding,
      217 distance_strategy=distance_strategy,
      218 pre_delete_collection=pre_delete_collection,
      219 **kwargs,
      220 )
      222 store.add_embeddings(
      223 texts=texts, embeddings=embeddings, metadatas=metadatas, ids=ids, **kwargs
      224 )
      226 return store
      TypeError: langchain.vectorstores.pgvector.PGVector() got multiple values for keyword argument 'connection_string'

    • @octavianreksa7994
      @octavianreksa7994 Před 10 měsíci

      @@bugbytes3923 nvm. The cause is there is another connection_string on virtual environment

  • @vinci_irl
    @vinci_irl Před 10 měsíci +1

    Is there any way I can use the data from the Postgres database directly, instead of using documents data?

  • @FatimaHABIB-jm4ji
    @FatimaHABIB-jm4ji Před 7 měsíci +4

    Thanks,
    I am having this error when creating the "vector" extension
    ERROR: Could not open extension control file "C:/Program Files/PostgreSQL/16/share/extension/vector.control": No such file or directory

    • @duongkhang4051
      @duongkhang4051 Před 5 měsíci +2

      Have you solved this problem? Pls help me to do this

  • @melsimibusireddy89
    @melsimibusireddy89 Před 4 měsíci

    Thank you so much for great video!, can please cover on Anthropic Claude with PGVECTOR. That would be a great help !

  • @vulnerablegrowth3774
    @vulnerablegrowth3774 Před 10 měsíci

    Is there any way to do hybrid search with this? Meaning, is it possible to do something like keyword search or some other filtering before doing semantic similarity? Or is this kind of feature only available in specific paid vector databases?

  • @helloh6
    @helloh6 Před 11 měsíci +4

    Fantastic video! Would be interesting to see a follow up on how this might work with Django?

    • @bugbytes3923
      @bugbytes3923  Před 11 měsíci +2

      Thanks a lot - I am planning a short video on Django and pgvector. There's a useful extension that integrates the two - coming soon!

    • @helloh6
      @helloh6 Před 11 měsíci

      @@bugbytes3923 Could I ask what the extension is so I could have a look while you're creating the video. Love your content!

    • @bugbytes3923
      @bugbytes3923  Před 11 měsíci +2

      @@helloh6 Thanks a lot! It's the same library I installed in this video to work with pgvector - this library has modules for working with Django - more details here:
      github.com/pgvector/pgvector-python#django

    • @helloh6
      @helloh6 Před 11 měsíci

      @@bugbytes3923 Amazing, thanks!

  • @tombomer8520
    @tombomer8520 Před 9 měsíci

    thanks for the video!
    do you know if there's a way to save the database locally after it's been initalised with `db = PGVector.from_documents(
    embedding=embeddings, documents=chunks, connection_string=connection_string
    )`?
    e.g. Faiss has a save_local() function

  • @paulowiz
    @paulowiz Před 7 měsíci

    Fantastic! Where is the Jupyter notebook?

  • @eugenetapang
    @eugenetapang Před 3 měsíci

    Excellent video, any chance instead of OpenAI ada embeddings, how about S-Bert to generate embeddings? possible code snippet would be appreciated. Thanks and love your content.

  • @joxa6119
    @joxa6119 Před 4 měsíci +3

    Edit:
    - Problem 1: My postgres container is within WSL2, which I cannot connect with PgAdmin from Windows
    - Solution : connect pgAdmin page container with pgvector container.
    - Problem 2: Object of type PosixPath is not JSON serializable
    - Solution:Change my POsixPath to string and pass to TextLoader

  • @toocutebydesign-rd3wx
    @toocutebydesign-rd3wx Před 6 měsíci

    Supabase uses their vec client for postgres/pgvector. This does not need docker but we are then limited to their free plan storage of 50MB. What do you think?

  • @shawman1960
    @shawman1960 Před 4 měsíci

    What PostgreSQL permissions or operator functions are required or recommended for pgvector?

  • @user-cf2hf7me9d
    @user-cf2hf7me9d Před 7 měsíci

    hey ! how do i get the uuid of records of langchain_pg_embeddings table to delete it later.

  • @user-kc5od7ii5o
    @user-kc5od7ii5o Před měsícem

    Is there any tutorial where I already have a table in postgres ? I found that I uploaded all the dicuments and created the index without langchain and now I want to acces that database but I found that all the tutorials starts from raw data and create the vectorstore in the process.

  • @ThinklikeTesla
    @ThinklikeTesla Před 8 měsíci

    What is the rationale behind calling embed_query vs embed_documents?

  • @sachintiwari2794
    @sachintiwari2794 Před 26 dny

    Is there any way to store in custom schema defined instead of public schema??

  • @user-uj5bc7mm6f
    @user-uj5bc7mm6f Před 4 měsíci

    is it possible to do something using chroma db to load sql data in to vector db there are not a lot of resources and i need to learn that

  • @vignesh462
    @vignesh462 Před 5 měsíci

    Hi, how to change default table names? like langchain_pg_collection to something else

  • @teunohooijer6788
    @teunohooijer6788 Před 10 měsíci

    great video. How does this compare to FTS for search? When would you want to use that over this? Would they get the same results in this case for example?

    • @bugbytes3923
      @bugbytes3923  Před 10 měsíci

      Thanks! The mechanism for FTS is different, so there's no guarantee that the same results would be reached. Maybe I could do a video quickly comparing these methods!

    • @teunohooijer6788
      @teunohooijer6788 Před 10 měsíci

      @@bugbytes3923 Would be a nice video I think. One of the advantages of FTS over this for searching products would be that if you have it on a online website you can't be ddos't to increase your API cost a lot.

  • @borknagarchile
    @borknagarchile Před 9 měsíci

    Super interesting video. I’m wondering if you know about how to prompt properly to openai to generate the vectors. By this I mean if there are ways to improve the quality of the vectors to query so the answer can be more precise. Thanks

    • @nedyalkokarabadzhakov5405
      @nedyalkokarabadzhakov5405 Před 9 měsíci +2

      with embeeding models there is no prompting, These are not chat models.

    • @borknagarchile
      @borknagarchile Před 9 měsíci

      @@nedyalkokarabadzhakov5405 so basically the embedding needs to be created by the most accurate text that you can provide right?

  • @user-uw1mi9wt4k
    @user-uw1mi9wt4k Před 3 měsíci

    where can I get the notebook for this?

  • @StonedApe420
    @StonedApe420 Před 11 měsíci

    GPT FineTune and Embedings

  • @user-jz1op3yt9w
    @user-jz1op3yt9w Před 5 měsíci

    Hi, i followed the steps you mentioned in blog but facing issue while connect and insert vectors to postgres database
    Please find the error below:
    texts = [d.page_content for d in documents]
    ^^^^^^^^^^^^^^
    AttributeError: 'tuple' object has no attribute 'page_content'

  • @VadimZverev
    @VadimZverev Před 9 měsíci

    hi, do you know what dimensions value should I use when creating vector column?

    • @bugbytes3923
      @bugbytes3923  Před 9 měsíci +1

      In this video, it should be 1536-dimensions. We used OpenAI's latest embedding model to create the embeddings, which has output dimensions of 1536.
      platform.openai.com/docs/guides/embeddings/second-generation-models

    • @VadimZverev
      @VadimZverev Před 9 měsíci

      @@bugbytes3923 thank you

  • @Phobos221B
    @Phobos221B Před 19 dny

    Hey, Can you also try to experiment with Langfuse and how it can be leveraged ?

    • @bugbytes3923
      @bugbytes3923  Před 19 dny +1

      I'll need to look into Langfuse. But possibly! I'm planning more GPT/vector/langchain videos.

  • @RJYL
    @RJYL Před 9 měsíci

    -p 5432:5432
    The postgresql and its pgvector have the same port mapping, is that right?

  • @user-ky4ev9bm9q
    @user-ky4ev9bm9q Před 9 měsíci

    Will there be any way to use the postgresql db tables directly instead of txt files?

    • @bugbytes3923
      @bugbytes3923  Před 9 měsíci +1

      With LLMs - I'll release a video this week on Retrieval Augmented Generation, where we use the DB table with Langchain and use the results of a DB query as context to an LLM prompt.

    • @user-ky4ev9bm9q
      @user-ky4ev9bm9q Před 9 měsíci

      Waiting!!@@bugbytes3923

  • @AbhinavKumarJha-zc1nt
    @AbhinavKumarJha-zc1nt Před měsícem

    how to do this with docs, csv and pptx files?

  • @dorianmatesic8101
    @dorianmatesic8101 Před 11 měsíci +2

    hello. great video, helped me a lot with exactly what I was looking for!
    Keep up the good work.
    I have a question. I followed you video and I downloaded docker image, I have my pgadmin4, but when i try making extension, it says: Could not open extension control file "C:/Program Files/PostgreSQL/15/share/extension/vector.control": No such file or directory.extension "vector" is not available
    Do you maybe know what is going on?
    Thank you in advance

    • @bugbytes3923
      @bugbytes3923  Před 11 měsíci +1

      Thank you!
      Regarding your problem: did you add the port mapping in the Docker run command? From port 5432:5432?
      I suspect that pgAdmin is trying to connect to Postgres running locally on your machine, rather than in the Docker container. Do you have Postgres running on your machine locally? You may need to stop that if Postgres is running on the same port in the Docker container.
      Not sure though, but let me know if you get it fixed or if you're still stuck!

    • @dorianmatesic8101
      @dorianmatesic8101 Před 10 měsíci

      @@bugbytes3923 oh, thank you sooo much!
      postgres did run locally on my machine on same port as doocker container. so i had to stop those proceses, and now it works!
      can't wait for the django video with pgvector! keep up the good work

    • @ajaypalsingh6329
      @ajaypalsingh6329 Před 10 měsíci

      I am also facing it can you please add steps so I can also solve this....
      Thanks you in advance

    • @madamada719
      @madamada719 Před 10 měsíci

      @@ajaypalsingh6329 if you have both docker and local postgre in yout pgadmin, you should stop those procceses within the task manager. Go to procceses and end all procceses regarding your postrgres. That is what worked for me honestly.
      I dont know if you have the same issue.

    • @bugbytes3923
      @bugbytes3923  Před 10 měsíci

      @@ajaypalsingh6329 windows or Mac?

  • @PawanJain-jb7bb
    @PawanJain-jb7bb Před 4 měsíci

    Hi, I find this video very informative and easy to understand.
    However, I am getting the below error
    when downloading pgVector image: Error response from daemon: pull access denied for arcane/pgvector, repository does not exist or may require 'docker login': denied: requested access to the resource is denied"

    • @srvapps
      @srvapps Před 12 dny

      try this:
      docker pull pgvector/pgvector:pg16

  • @anand-st7mo
    @anand-st7mo Před 2 měsíci

    how to close pgvector connection, after everything is done.

  • @RJYL
    @RJYL Před 9 měsíci +1

    There's no mention of installing PostgreSQL first.

    • @bugbytes3923
      @bugbytes3923  Před 9 měsíci +1

      The installation is done via the Docker commands.

    • @RJYL
      @RJYL Před 9 měsíci

      @@bugbytes3923 ers\Administrator> docker run --name pgvector5-demo -e POSTGRES_PASSWORD=test -p 5432:5432 ankane/pgvector
      popen failure: Cannot allocate memory
      initdb: error: program "postgres" is needed by initdb but was not found in the same directory as "/usr/lib/postgresql/15/bin/initdb"
      Despite following the post steps several times, the error still appears. Maybe it's because I'm using Win10.

  • @PrathapVeera-zc8vu
    @PrathapVeera-zc8vu Před měsícem

    Hi .. Im getting below error while running the CREATE EXTENSION vector query in Database. Can you please help,
    ERROR: Could not open extension control file "C:/Program Files/PostgreSQL/16/share/extension/vector.control": No such file or directory.extension "vector" is not available
    ERROR: extension "vector" is not available
    SQL state: 0A000
    Detail: Could not open extension control file "C:/Program Files/PostgreSQL/16/share/extension/vector.control": No such file or directory.
    Hint: The extension must first be installed on the system where PostgreSQL is running.

  • @harrisonpaul7941
    @harrisonpaul7941 Před 2 měsíci

    Kindly help me with the below error..
    When I try to execute CREATE EXTENSION vector I'm getting the below error
    ERROR: Could not open extension control file "/usr/share/postgresql/16/extension/vector.control": No such file or directory.extension "vector" is not available
    ERROR: extension "vector" is not available
    SQL state: 0A000
    Detail: Could not open extension control file "/usr/share/postgresql/16/extension/vector.control": No such file or directory.
    Hint: The extension must first be installed on the system where PostgreSQL is running.
    Note - both Postgres and pgvector running in docker

    • @srvapps
      @srvapps Před 12 dny

      This: CREATE EXTENSION vector; , worked for me
      And i used this docker: docker pull pgvector/pgvector:pg16

  • @MuratJumashev
    @MuratJumashev Před 8 měsíci +2

    Typo in the blogpost:
    `CREATE EXTENSION vector;` instead of `CREATE EXTENSION pgvector;`

  • @fabsync
    @fabsync Před 5 měsíci

    super awesome! It will be great to see this integrated with django-ninja to build a chat with pdf (but without using chatgpt --something similar to this czcams.com/video/rIV1EseKwU4/video.html which is essentially from primordial privategpt....