PostgreSQL as VectorDB - Beginner Tutorial

Sdílet
Vložit
  • čas přidán 25. 08. 2024
  • Want to get started with freelancing? Let me help: www.datalumina...
    Need help with a project? Work with me: www.datalumina...
    🔗 Links in this video
    github.com/dav...
    github.com/pgv...
    dev.to/confide...
    👤 Connect with me on LinkedIn
    / daveebbelaar
    👋🏻 About Me
    Hey there, my name is @daveebbelaar and I work as a freelance Data Scientist / AI Engineer and run a company called Datalumina. You've stumbled upon my CZcams channel, where I give away all my secrets when it comes to working with data. If you want to learn more about what I do, then head over to www.datalumina...

Komentáře • 42

  • @gonzalea35
    @gonzalea35 Před 8 dny

    Hey man, thank you very much. I have to build a POC with PGVector, and your video just nailed my gaps in understanding the basics. Also, the repository was quite useful. You are awesome. Keep up the good work.

  • @bin4ry_d3struct0r
    @bin4ry_d3struct0r Před 8 měsíci +4

    One of the things I learned in the past few months working with RAG-based LLMs is that it's definitely not one size fits all. The quality of inference depends on the embedding algorithm as well as the indexing and retrieval mechanism of the vector database.
    This was a great video!

  • @fabsync
    @fabsync Před 2 měsíci +2

    A new fan here! It will be great to see a video where you use streamlit or something else to create a search with pgvector (full text search)

  • @ConnorLeech
    @ConnorLeech Před měsícem +1

    in the video you are creating the data from text files, but it seems like a main advantage of having it on your postgres db is being able to use / query the data in your tables.
    i'd love to see how to build a full text search or something from data stored in regular postgres tables!

  • @tushaar9027
    @tushaar9027 Před 2 měsíci +1

    Hi Dave, this is great video thanks for sharing the knowledge , i really liked the idea of using postgres sql , can you pls make one video on setting up postgres on azure

  • @gr8tbigtreehugger
    @gr8tbigtreehugger Před 8 měsíci +1

    Thanks for this! I was leaning towards pgvector and your video convinced me so!

  • @myhificloud
    @myhificloud Před 8 měsíci +1

    Clean solution. This is helpful, thank you for this.

  • @abhishekchopda4100
    @abhishekchopda4100 Před 7 měsíci +1

    Great Video! Helped me in my work! Thanks :)

  • @jennymelia
    @jennymelia Před 7 měsíci +1

    LOL dave i was googling if i can use postcres somehow instead of pinecone and your video popped up 🤣🤣👍🏽👍🏽👍🏽 Love it!

    • @daveebbelaar
      @daveebbelaar  Před 7 měsíci +2

      Haha you're becoming a true engineer Jenny. Those are some pretty serious Google searches haha. Let me know if you need further help!

    • @jennymelia
      @jennymelia Před 7 měsíci

      @@daveebbelaar for sure dude! 🤌🏽 trying to get in that coder level 😂😂😂

  • @touma4659
    @touma4659 Před 2 měsíci +1

    thank you💖💖

  • @krunkey
    @krunkey Před 3 měsíci +1

    Thanks for the video. I'll be trying PGVector! Do you know of any good alternative to OpenAI embeddings that can be run locally?

  • @aimattant
    @aimattant Před 13 dny

    I am working with PGvector extensions with PG databases with my current AI python project. Querying Postgres database with nlp - natural language processing - when i go to the html app and enter the search under NLP query - if just the database items mentioned - such as women'S jackets and then limit - I get the results on screen and csv download, but if i add i would like to get a list of womens jackets, I get a error. Is there a way around this? Would appreciate your help.

  • @erwinl7794
    @erwinl7794 Před 7 měsíci +1

    What about an open source vector store like qdrant?

  • @henkhbit5748
    @henkhbit5748 Před 7 měsíci

    Thanks for showing pg vector. weaviate is also free and can be run locally using docker. I agree I am for open source.

  • @EmilioGagliardi
    @EmilioGagliardi Před 8 měsíci +1

    THis was super interesting. Do you have a video that explains your PGVector setup (do you install the database locally or do you have a cloud account)? I'd love to have a setup where I can view my document collections and embeddings in my editor like that. I use VSCode right now, so not sure ... good stuff!

    • @daveebbelaar
      @daveebbelaar  Před 8 měsíci

      I talk about this near the end of the video

  • @SigAiOC-ke3ss
    @SigAiOC-ke3ss Před 8 měsíci +1

    I didn't fully understood it from the video but are you comparing times between using Pinecone on a remote host vs Postgres ran locally?

    • @daveebbelaar
      @daveebbelaar  Před 8 měsíci

      Not only processing time (because I know that's not a true fair comparison), but also easy of use and data management.

    • @SigAiOC-ke3ss
      @SigAiOC-ke3ss Před 8 měsíci +3

      @@daveebbelaar I get that, but in a production environment it makes a big difference especially when you think of use cases. I would be curious to see a comparison between a cloud hosted postgres and pinecone or,between the locally hosted postgres and something like chroma

  • @anand-st7mo
    @anand-st7mo Před 4 měsíci +1

    Bro, did you do any indexing?

  • @MaliciousCode-gw5tq
    @MaliciousCode-gw5tq Před 3 měsíci

    I have follow up question if let say 1 chapter of a book total words count is 3k will it be able to store all the 3k words ?

  • @izzatirfan2794
    @izzatirfan2794 Před měsícem

    Greatt!! I enjoy watching your video. I have tried to hands-on the code from your GitHub but i am facing an error ModuleNotFoundError: No module named 'pgvector_service'. Then, I tried to pip install pgvector_service but this occured. ERROR: Could not find a version that satisfies the requirement pgvector_service (from versions: none)
    ERROR: No matching distribution found for pgvector_service
    Do you have any ideas how to overcome this?

  • @Michael-jl7wn
    @Michael-jl7wn Před 6 měsíci

    How would this work if you were using more structured data that needed to be stored in columns and rows?

  • @MichaelHoughton_
    @MichaelHoughton_ Před 8 měsíci +1

    Could you put the vectors inside fire base ? That’d be epic

    • @3wcdev878
      @3wcdev878 Před 8 měsíci

      Nope, firbase has a limit, tried it.

    • @MichaelHoughton_
      @MichaelHoughton_ Před 8 měsíci

      @@3wcdev878 dang that’s unfortunate

  • @eyemazed
    @eyemazed Před 7 měsíci

    thing that bothers me about using postgres for RAG is that the vector search works fine, but its full text search capabilities are severely handicapped. it doesn't support partial or fuzzy matching, so you can't really do a nice reciprocial rank fusion between resources retrieved by multiple channels (vector + full text). i'm going to try ElasticSearch next, as i've previously worked with it and its really good at full text search (TF/IDF, fuzzy search, partial search, stemming...), and the newer versions also support vector search. the downside is having to sync elastic with your main db all the time...

    • @awakenwithoutcoffee
      @awakenwithoutcoffee Před 14 dny

      tbh a better solution is to work with meta-data filtering instead of text search as it is more globally contextual aware and is faster. ElasticSearch seems cool although I haven't looked into it much.

  • @say.xy_
    @say.xy_ Před 7 měsíci

    Hi Dave, I’m also using Pgvector but output are not really that good, could you make a video on improving performance of RAG pipeline in langchain and pgvector, thanks.

  • @DanielWeikert
    @DanielWeikert Před 8 měsíci

    How do you update the vectorstore (e.g. replace outdated data?
    br

    • @gr8tbigtreehugger
      @gr8tbigtreehugger Před 8 měsíci

      Just update the outdated data like you would in any db.

  • @3wcdev878
    @3wcdev878 Před 8 měsíci

    But you tested it with a small dataset, most relational databases go slower as they grow.

  • @gilbertb99
    @gilbertb99 Před 8 měsíci

    pinecone is managed isnt it? theres more reasons why enterprises would use and pay for it. For simple side projects, then yeah pgvector locally makes sense.

  • @greendsnow
    @greendsnow Před 7 měsíci

    pgvector is the WORST performing vector db according to all comparison charts.
    you need to tell people if you're sponsored by supabase, otherwise this is not ethical.

    • @daveebbelaar
      @daveebbelaar  Před 7 měsíci +3

      Can you share some more insights on this? And no, I am not sponsored or affiliated with Supabase.