Postgres pgvector Extension - Vector Database with PostgreSQL / Langchain Integration

BugBytes

zhlédnutí 45 278

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 25. 06. 2024
Blog Post: bugbytes.io/posts/vector-data...
In this video, we'll look at the pgvector extension for PostgreSQL, that allows you to turn your Postgres database into a vector data-store!
pgvector adds the vector data-type and distance computation operators (L2, inner product, and cosine distance) to allow you to query for "similar" items in the vector-space.
We'll see how to set pgvector up in a Docker container, and will see how to integrate it with Langchain via the PGVector object.
We'll look at how to take a piece of text, split it into chunks, create embeddings from those chunks using OpenAI, and then store the embeddings in the Postgres vector database. We'll also see how to query the database for vectors/documents that are similar to a text prompt/query.
☕️ 𝗕𝘂𝘆 𝗺𝗲 𝗮 𝗰𝗼𝗳𝗳𝗲𝗲:
To support the channel and encourage new videos, please consider buying me a coffee here:
ko-fi.com/bugbytes
📌 𝗖𝗵𝗮𝗽𝘁𝗲𝗿𝘀:
00:00 Intro
00:41 Introduction to pgvector for PostgreSQL
03:23 Splitting text file into chunks with Langchain RecursiveCharacterTextSplitter
06:10 Using OpenAI to get embeddings for each chunk with OpenAIEmbeddings object
10:54 Setting up pgvector and PostgreSQL in a Docker container
16:38 Using the Langchain PGVector object to connect to PostgreSQL
21:47 Finding similar vectors to a query in pgvector
25:29 Querying pgvector with SQL to get cosine distances
𝗦𝗼𝗰𝗶𝗮𝗹 𝗠𝗲𝗱𝗶𝗮:
📖 Blog: bugbytes.io/posts/vector-data...
👾 Github: github.com/bugbytes-io/
🐦 Twitter: / bugbytesio
📚 𝗙𝘂𝗿𝘁𝗵𝗲𝗿 𝗿𝗲𝗮𝗱𝗶𝗻𝗴 𝗮𝗻𝗱 𝗶𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻:
Blog Post: bugbytes.io/posts/vector-data...
pgvector: github.com/pgvector/pgvector
pgvector DockerHub image: hub.docker.com/r/ankane/pgvector
State of the Union text: github.com/hwchase17/chroma-l...
OpenAI Embeddings: platform.openai.com/docs/guid...
Langchain Vectorstores: python.langchain.com/docs/mod...
#python #langchain #datascience #postgresql

Komentáře • 120

@kevon217 Před 8 měsíci
Very thorough walkthrough. Thanks!
@wadejohnson4542 Před 6 měsíci ⁺³
Most excellent. I am now a monthly supporter. You deserve to be paid.
@shinchima Před 6 měsíci
Brilliant content. Concise, no waffle. Thank you
@bugbytes3923 Před 6 měsíci ⁺¹
Thanks a lot!
@Andromeda26_ Před 8 měsíci
Thank you so much for sharing the details, Your informative CZcams videos have been incredibly helpful. Great job on putting together such valuable content! Keep up the outstanding work and continue enlightening us. We truly appreciate your contributions!
@bugbytes3923 Před 8 měsíci
Thanks a lot, glad to hear that the videos have been helpful - thanks for watching and supporting the channel!
@silkogelman Před 11 měsíci ⁺³
Just yesterday I thought "pgvector would be interesting to see a video about".
And then you publish this! 👏👏👏🥳
Thank you Lyle. 🙏
@bugbytes3923 Před 11 měsíci ⁺¹
Thanks a lot Sil!
@tejasvinnarayan2887 Před 3 měsíci ⁺¹
Extremely complex concepts published in the simplest way! I could run the whole notebook typed without errors! Thank you for the clarity!
@bugbytes3923 Před 3 měsíci
Thanks a lot, really happy to hear that! Cheers!
@australianman8566 Před 11 měsíci
Dude thanks for making this. I always learn something from your videos. Thank you!
@bugbytes3923 Před 11 měsíci
Thanks a lot, glad to hear that! Thank you for the support!
@sqlsql4304 Před 8 měsíci ⁺¹
Really appreciate your efforts you have put in for this tutorial
@bugbytes3923 Před 8 měsíci
Thanks a lot!
@mattiassoderberg3394 Před 8 měsíci ⁺¹
Fantastic comprehensive walkthrough of how to use PGVector and Python to work with vectors for your AI stuff 😀
@bugbytes3923 Před 8 měsíci ⁺¹
Thanks a lot Mattias!
@mattiassoderberg3394 Před 8 měsíci
@@bugbytes3923thank YOU, now looking into the one where you use Django as the front end to all of this 😊
@LearningWorldChatGPT Před 10 měsíci ⁺²
What a fantastic video.
Thank you, BugBytes !
@bugbytes3923 Před 10 měsíci
Thanks a lot!
@alexandredamiao1365 Před 4 měsíci
Thank you so much for this tutorial! Very, very high quality!
@bugbytes3923 Před 3 měsíci
Thanks a lot, glad you liked!
@teddyperera8531 Před měsícem ⁺¹
clear and well structured. you have an amazing style of teaching.
@bugbytes3923 Před měsícem
Awesome to hear, thanks a lot!
@Mankind5490 Před 4 měsíci ⁺²
Straight forward explanation. Thank you
@bugbytes3923 Před 3 měsíci
Thanks a lot!
@fabsync Před měsícem
oh man.. it has been a while and it is still the best tutorial out there.. It will be great to see something with pgvector again with django-ninja...
@bugbytes3923 Před měsícem
Thanks a lot! I'd love to do some more on PGVector - if anyone has any project ideas, let me know here!
@theagainagain Před 8 měsíci ⁺¹
This was super helpful, thanks!
@bugbytes3923 Před 8 měsíci
Glad to hear that - thanks for watching!
@arturgomes1654 Před 6 měsíci ⁺¹
thank you so much for this content!
@bugbytes3923 Před 6 měsíci ⁺¹
Thanks a lot for watching!
@Kingromstar Před 7 měsíci
thanks for this, it was a great help!
@bugbytes3923 Před 7 měsíci
Glad to hear that! Thank you for watching.
@AA-xz1ut Před 10 měsíci ⁺¹
fantastic content, thank you! would be great if you could do a more in depth video on how do indexing (HNSW) with the same jupyter notebook example
@rishu4225 Před 6 měsíci
Thanks man, Great content!
@bugbytes3923 Před 6 měsíci
Thanks a lot!
@johnallen9992 Před 8 měsíci
powerful libs - yes its almost as if AI 'needs' a highly artistic oracle to 'shape' it's 'stance' in order to focus on the goals/need of the User/app
@victoratui2445 Před 9 dny
Great job! Extremely usefull ! tks.
@bugbytes3923 Před 8 dny
Thanks a lot!
@dmitrymikhailovnicepianomu8688 Před 10 měsíci ⁺¹
Very interesting!
@bugbytes3923 Před 10 měsíci
Thanks!
@ravikumarhaligode2949 Před 6 dny
Great Video Sir
@bugbytes3923 Před 5 dny
Thanks a lot!
@joventan4303 Před 3 měsíci
Thanks this is very helpful
@bugbytes3923 Před 3 měsíci
Thanks a lot!
@swiftmindai Před 5 měsíci
Good contents. Thanks.
@bugbytes3923 Před 5 měsíci
Thanks a lot!
@duongkhang4051 Před 5 měsíci
i am having this error, pls help me how to solve this
Could not open extension control file "/PostgreSQL/16/share/extension/vector.control": No such file or directory.extension "vector" is not available.
@shaunpx1 Před 8 měsíci
loving your videos man, thank you for clear concise explanation of these topics. Do have any videos using RAG and agents in Django? I am using Django RestAPI and have been struggling with an agent controller that work fine in the notebook but then times out in my API request with the exact same code usin Char ReAct Description?
@octavianreksa7994 Před 10 měsíci
thanks. really helpful
@bugbytes3923 Před 10 měsíci
Thanks for watching!
@octavianreksa7994 Před 10 měsíci
@@bugbytes3923 Hey I have this error. do you know why?
connection_string = "postgresql+psycopg2://user:pass@localhost:5432/db"
collection_name = 'financial_qa'
db = PGVector.from_documents(
embedding=instructor_embeddings,
documents=texts,
collection_name=collection_name,
connection_string=connection_string
)
File ~\.conda\envs\financial_qa\lib\site-packages\langchain\vectorstores\pgvector.py:578, in PGVector.from_documents(cls, documents, embedding, collection_name, distance_strategy, ids, pre_delete_collection, **kwargs)
574 connection_string = cls.get_connection_string(kwargs)
576 kwargs["connection_string"] = connection_string
--> 578 return cls.from_texts(
579 texts=texts,
580 pre_delete_collection=pre_delete_collection,
581 embedding=embedding,
582 distance_strategy=distance_strategy,
583 metadatas=metadatas,
584 ids=ids,
585 collection_name=collection_name,
586 **kwargs,
587 )
File ~\.conda\envs\financial_qa\lib\site-packages\langchain\vectorstores\pgvector.py:453, in PGVector.from_texts(cls, texts, embedding, metadatas, collection_name, distance_strategy, ids, pre_delete_collection, **kwargs)
445 """
446 Return VectorStore initialized from texts and embeddings.
447 Postgres connection string is required
448 "Either pass it as a parameter
449 or set the PGVECTOR_CONNECTION_STRING environment variable.
450 """
451 embeddings = embedding.embed_documents(list(texts))
--> 453 return cls.__from(
454 texts,
455 embeddings,
456 embedding,
457 metadatas=metadatas,
458 ids=ids,
459 collection_name=collection_name,
460 distance_strategy=distance_strategy,
461 pre_delete_collection=pre_delete_collection,
462 **kwargs,
463 )
File ~\.conda\envs\financial_qa\lib\site-packages\langchain\vectorstores\pgvector.py:213, in PGVector.__from(cls, texts, embeddings, embedding, metadatas, ids, collection_name, distance_strategy, pre_delete_collection, **kwargs)
210 metadatas = [{} for _ in texts]
211 connection_string = cls.get_connection_string(kwargs)
--> 213 store = cls(
214 connection_string=connection_string,
215 collection_name=collection_name,
216 embedding_function=embedding,
217 distance_strategy=distance_strategy,
218 pre_delete_collection=pre_delete_collection,
219 **kwargs,
220 )
222 store.add_embeddings(
223 texts=texts, embeddings=embeddings, metadatas=metadatas, ids=ids, **kwargs
224 )
226 return store
TypeError: langchain.vectorstores.pgvector.PGVector() got multiple values for keyword argument 'connection_string'
@octavianreksa7994 Před 10 měsíci
@@bugbytes3923 nvm. The cause is there is another connection_string on virtual environment
@vinci_irl Před 10 měsíci ⁺¹
Is there any way I can use the data from the Postgres database directly, instead of using documents data?
@FatimaHABIB-jm4ji Před 7 měsíci ⁺⁴
Thanks,
I am having this error when creating the "vector" extension
ERROR: Could not open extension control file "C:/Program Files/PostgreSQL/16/share/extension/vector.control": No such file or directory
@duongkhang4051 Před 5 měsíci ⁺²
Have you solved this problem? Pls help me to do this
@melsimibusireddy89 Před 4 měsíci
Thank you so much for great video!, can please cover on Anthropic Claude with PGVECTOR. That would be a great help !
@vulnerablegrowth3774 Před 10 měsíci
Is there any way to do hybrid search with this? Meaning, is it possible to do something like keyword search or some other filtering before doing semantic similarity? Or is this kind of feature only available in specific paid vector databases?
@helloh6 Před 11 měsíci ⁺⁴
Fantastic video! Would be interesting to see a follow up on how this might work with Django?
@bugbytes3923 Před 11 měsíci ⁺²
Thanks a lot - I am planning a short video on Django and pgvector. There's a useful extension that integrates the two - coming soon!
@helloh6 Před 11 měsíci
@@bugbytes3923 Could I ask what the extension is so I could have a look while you're creating the video. Love your content!
@bugbytes3923 Před 11 měsíci ⁺²
@@helloh6 Thanks a lot! It's the same library I installed in this video to work with pgvector - this library has modules for working with Django - more details here:
github.com/pgvector/pgvector-python#django
@helloh6 Před 11 měsíci
@@bugbytes3923 Amazing, thanks!
@tombomer8520 Před 9 měsíci
thanks for the video!
do you know if there's a way to save the database locally after it's been initalised with `db = PGVector.from_documents(
embedding=embeddings, documents=chunks, connection_string=connection_string
)`?
e.g. Faiss has a save_local() function
@paulowiz Před 7 měsíci
Fantastic! Where is the Jupyter notebook?
@eugenetapang Před 3 měsíci
Excellent video, any chance instead of OpenAI ada embeddings, how about S-Bert to generate embeddings? possible code snippet would be appreciated. Thanks and love your content.
@joxa6119 Před 4 měsíci ⁺³
Edit:
- Problem 1: My postgres container is within WSL2, which I cannot connect with PgAdmin from Windows
- Solution : connect pgAdmin page container with pgvector container.
- Problem 2: Object of type PosixPath is not JSON serializable
- Solution:Change my POsixPath to string and pass to TextLoader
@toocutebydesign-rd3wx Před 6 měsíci
Supabase uses their vec client for postgres/pgvector. This does not need docker but we are then limited to their free plan storage of 50MB. What do you think?
@shawman1960 Před 4 měsíci
What PostgreSQL permissions or operator functions are required or recommended for pgvector?
@user-cf2hf7me9d Před 7 měsíci
hey ! how do i get the uuid of records of langchain_pg_embeddings table to delete it later.
@user-kc5od7ii5o Před měsícem
Is there any tutorial where I already have a table in postgres ? I found that I uploaded all the dicuments and created the index without langchain and now I want to acces that database but I found that all the tutorials starts from raw data and create the vectorstore in the process.
@ThinklikeTesla Před 8 měsíci
What is the rationale behind calling embed_query vs embed_documents?
@sachintiwari2794 Před 26 dny
Is there any way to store in custom schema defined instead of public schema??
@user-uj5bc7mm6f Před 4 měsíci
is it possible to do something using chroma db to load sql data in to vector db there are not a lot of resources and i need to learn that
@vignesh462 Před 5 měsíci
Hi, how to change default table names? like langchain_pg_collection to something else
@teunohooijer6788 Před 10 měsíci
great video. How does this compare to FTS for search? When would you want to use that over this? Would they get the same results in this case for example?
@bugbytes3923 Před 10 měsíci
Thanks! The mechanism for FTS is different, so there's no guarantee that the same results would be reached. Maybe I could do a video quickly comparing these methods!
@teunohooijer6788 Před 10 měsíci
@@bugbytes3923 Would be a nice video I think. One of the advantages of FTS over this for searching products would be that if you have it on a online website you can't be ddos't to increase your API cost a lot.
@borknagarchile Před 9 měsíci
Super interesting video. I’m wondering if you know about how to prompt properly to openai to generate the vectors. By this I mean if there are ways to improve the quality of the vectors to query so the answer can be more precise. Thanks
@nedyalkokarabadzhakov5405 Před 9 měsíci ⁺²
with embeeding models there is no prompting, These are not chat models.
@borknagarchile Před 9 měsíci
@@nedyalkokarabadzhakov5405 so basically the embedding needs to be created by the most accurate text that you can provide right?
@user-uw1mi9wt4k Před 3 měsíci
where can I get the notebook for this?
@StonedApe420 Před 11 měsíci
GPT FineTune and Embedings
@user-jz1op3yt9w Před 5 měsíci
Hi, i followed the steps you mentioned in blog but facing issue while connect and insert vectors to postgres database
Please find the error below:
texts = [d.page_content for d in documents]
^^^^^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'page_content'
@VadimZverev Před 9 měsíci
hi, do you know what dimensions value should I use when creating vector column?
@bugbytes3923 Před 9 měsíci ⁺¹
In this video, it should be 1536-dimensions. We used OpenAI's latest embedding model to create the embeddings, which has output dimensions of 1536.
platform.openai.com/docs/guides/embeddings/second-generation-models
@VadimZverev Před 9 měsíci
@@bugbytes3923 thank you
@Phobos221B Před 19 dny
Hey, Can you also try to experiment with Langfuse and how it can be leveraged ?
@bugbytes3923 Před 19 dny ⁺¹
I'll need to look into Langfuse. But possibly! I'm planning more GPT/vector/langchain videos.
@RJYL Před 9 měsíci
-p 5432:5432
The postgresql and its pgvector have the same port mapping, is that right?
@bugbytes3923 Před 9 měsíci
Yes, that's right!
@user-ky4ev9bm9q Před 9 měsíci
Will there be any way to use the postgresql db tables directly instead of txt files?
@bugbytes3923 Před 9 měsíci ⁺¹
With LLMs - I'll release a video this week on Retrieval Augmented Generation, where we use the DB table with Langchain and use the results of a DB query as context to an LLM prompt.
@user-ky4ev9bm9q Před 9 měsíci
Waiting!!@@bugbytes3923
@AbhinavKumarJha-zc1nt Před měsícem
how to do this with docs, csv and pptx files?
@dorianmatesic8101 Před 11 měsíci ⁺²
hello. great video, helped me a lot with exactly what I was looking for!
Keep up the good work.
I have a question. I followed you video and I downloaded docker image, I have my pgadmin4, but when i try making extension, it says: Could not open extension control file "C:/Program Files/PostgreSQL/15/share/extension/vector.control": No such file or directory.extension "vector" is not available
Do you maybe know what is going on?
Thank you in advance
@bugbytes3923 Před 11 měsíci ⁺¹
Thank you!
Regarding your problem: did you add the port mapping in the Docker run command? From port 5432:5432?
I suspect that pgAdmin is trying to connect to Postgres running locally on your machine, rather than in the Docker container. Do you have Postgres running on your machine locally? You may need to stop that if Postgres is running on the same port in the Docker container.
Not sure though, but let me know if you get it fixed or if you're still stuck!
@dorianmatesic8101 Před 10 měsíci
@@bugbytes3923 oh, thank you sooo much!
postgres did run locally on my machine on same port as doocker container. so i had to stop those proceses, and now it works!
can't wait for the django video with pgvector! keep up the good work
@ajaypalsingh6329 Před 10 měsíci
I am also facing it can you please add steps so I can also solve this....
Thanks you in advance
@madamada719 Před 10 měsíci
@@ajaypalsingh6329 if you have both docker and local postgre in yout pgadmin, you should stop those procceses within the task manager. Go to procceses and end all procceses regarding your postrgres. That is what worked for me honestly.
I dont know if you have the same issue.
@bugbytes3923 Před 10 měsíci
@@ajaypalsingh6329 windows or Mac?
@PawanJain-jb7bb Před 4 měsíci
Hi, I find this video very informative and easy to understand.
However, I am getting the below error
when downloading pgVector image: Error response from daemon: pull access denied for arcane/pgvector, repository does not exist or may require 'docker login': denied: requested access to the resource is denied"
@srvapps Před 12 dny
try this:
docker pull pgvector/pgvector:pg16
@anand-st7mo Před 2 měsíci
how to close pgvector connection, after everything is done.
@RJYL Před 9 měsíci ⁺¹
There's no mention of installing PostgreSQL first.
@bugbytes3923 Před 9 měsíci ⁺¹
The installation is done via the Docker commands.
@RJYL Před 9 měsíci
@@bugbytes3923 ers\Administrator> docker run --name pgvector5-demo -e POSTGRES_PASSWORD=test -p 5432:5432 ankane/pgvector
popen failure: Cannot allocate memory
initdb: error: program "postgres" is needed by initdb but was not found in the same directory as "/usr/lib/postgresql/15/bin/initdb"
Despite following the post steps several times, the error still appears. Maybe it's because I'm using Win10.
@PrathapVeera-zc8vu Před měsícem
Hi .. Im getting below error while running the CREATE EXTENSION vector query in Database. Can you please help,
ERROR: Could not open extension control file "C:/Program Files/PostgreSQL/16/share/extension/vector.control": No such file or directory.extension "vector" is not available
ERROR: extension "vector" is not available
SQL state: 0A000
Detail: Could not open extension control file "C:/Program Files/PostgreSQL/16/share/extension/vector.control": No such file or directory.
Hint: The extension must first be installed on the system where PostgreSQL is running.
@iamsirprize3585 Před měsícem
could you find a solution to this issue?
@harrisonpaul7941 Před 2 měsíci
Kindly help me with the below error..
When I try to execute CREATE EXTENSION vector I'm getting the below error
ERROR: Could not open extension control file "/usr/share/postgresql/16/extension/vector.control": No such file or directory.extension "vector" is not available
ERROR: extension "vector" is not available
SQL state: 0A000
Detail: Could not open extension control file "/usr/share/postgresql/16/extension/vector.control": No such file or directory.
Hint: The extension must first be installed on the system where PostgreSQL is running.
Note - both Postgres and pgvector running in docker
@srvapps Před 12 dny
This: CREATE EXTENSION vector; , worked for me
And i used this docker: docker pull pgvector/pgvector:pg16
@MuratJumashev Před 8 měsíci ⁺²
Typo in the blogpost:
`CREATE EXTENSION vector;` instead of `CREATE EXTENSION pgvector;`
@fabsync Před 5 měsíci
super awesome! It will be great to see this integrated with django-ninja to build a chat with pdf (but without using chatgpt --something similar to this czcams.com/video/rIV1EseKwU4/video.html which is essentially from primordial privategpt....

Další v pořadí

Automatické přehrávání

Django & PGVector - Vector-Based Search in Django with PGVector & PostgreSQL