OpenAI Embeddings and Vector Databases Crash Course

Sdílet
Vložit
  • čas přidán 27. 05. 2024
  • Embeddings and Vectors are a great way of storing and retrieving information for use with AI services. OpenAI provides a great embedding API to do this. Postman lets you make these with easy at www.postman.com/ (today's sponsor)
    In this video we will explore how to create a Vector Database by creating embeddings using the OpenAI API and then storing them in SingleStore.
    The first part of the video will cover how to create an embedding using just API requests with Postman. Then we will jump into Single Store and store these in a new database made specifically for vectors like this.
    00:00 - Introduction
    00:10 - What are Embeddings and Vectors
    02:14 - Setup OpenAI Embeddings
    03:11 - Setup Postman API Requests to create Embeddings
    03:55 - Create Embedding
    03:55 - Create Embedding
    06:55 - Create PDF or Document Embedding
    07:43 - Vector Database - Setup with SingleStore
    08:20 - Vector Database - Create Database
    09:32 - Vector Database - Create Table
    10:41 - Vector Database - Insert Embedding Row
    13:25 - Vector Database - Search Embeddings
    15:18 - Embedding function with JavaScript and NodeJS
    18:08 - OpenAI and GPT Digital Book
    18:29 - Conclusion
    Postman: (today's Sponsor) for API Requests
    www.postman.com/
    OpenAI Embedding Documentation:
    platform.openai.com/docs/api-...
    SingleStore Vector Database
    www.singlestore.com/cloud-tri...
    ⭐ Teach Me OpenAI & GPT - Digital Book ⭐
    enhanceui.gumroad.com/l/teach...
    Learn Design for Developers!
    A book I've created to help you improve the look of your apps and websites.
    📘 Enhance UI: www.enhanceui.com/
    Feel free to follow me on:
    🐦 Twitter: intent/follow?scr...
    💬 Discord: / discord
    💸 Patreon: / adriantwarog
    Software & Discounts:
    🚾 Webflow: webflow.grsm.io/adrian
    🌿 Envato: 1.envato.market/yRZjz2
    🌿 Envato Elements: 1.envato.market/LP0OJZ
    🔴 Elementor: elementor.com/adrian/?ref=23140
    ✖ Editor X: www.editorx.com/adrian-twarog
    Computer Gear:
    ⬛ Monitor: amzn.to/3f9DOQI
    ⌨ Keyboard: amzn.to/3eA5UFD
    🐁 Mouse: amzn.to/3xVJO8l
    🎤 Mic: amzn.to/3hgCfms
    📱 Tablet: amzn.to/3ewt7sa
    💡 Lighting: amzn.to/3vOZeZY
    💡 Key Lighting: amzn.to/3f6qP2f
    Camera Equipment:
    📷 Camera: amzn.to/3uCv4J9
    📸 Primary Lens: amzn.to/3vT6wMm
    📸 Secondary Lens: amzn.to/3tyqWIX
    🎥 Secondary Camera: amzn.to/3o2zCGi
    🎙 Camera Mic: amzn.to/33tCz9l
    🎞 USB to HDMI: amzn.to/33yW9RE
  • Věda a technologie

Komentáře • 166

  • @SuperITPRO
    @SuperITPRO Před 9 měsíci +33

    My ADHD normally overrides my concentration. Your tutorial pace, live coding, and narrative made me complete my 1st Open AI coded app - thank you!

    • @zzej
      @zzej Před 9 měsíci +1

      Same

    • @nicholastroyandersen9505
      @nicholastroyandersen9505 Před 8 měsíci +4

      Don't use ADHD as an excuse, it ain't no sickness, just personality. Take it and make it your best quality.

    • @nickscherrer1735
      @nickscherrer1735 Před 8 měsíci

      @@nicholastroyandersen9505it’s…. Not a personality, lmfao. It’s a very clear set of learning disabilities centered around working memory, executive function, and tuning out

    • @davidabellangarrido2056
      @davidabellangarrido2056 Před 8 měsíci

      same and without knowing english

    • @pauls064
      @pauls064 Před 3 měsíci

      @@nicholastroyandersen9505it’s literally a neurological condition that can be seen on scans and measured… ignorant comment

  • @user-zy5nu7xp1d
    @user-zy5nu7xp1d Před 9 měsíci +15

    I have seen multiple tutorials, this is by far the best and most concise, great work man

  • @nickfleming3719
    @nickfleming3719 Před 9 měsíci +85

    That isn't a vector database. It's a relational database with vectors stored on a text column. In practice, you will have thousands of embeddings and performance will tank with this setup

    • @trevorbaier7072
      @trevorbaier7072 Před 6 měsíci +3

      What's a more ideal solution for storing vectors?

    • @brookster7772
      @brookster7772 Před 6 měsíci +5

      From my investigation, Redis is an excellent vector store to be used in both development and production especially when it’s a local Dockerized instance

    • @SussyBacca
      @SussyBacca Před 6 měsíci +4

      Mongodb atlas is awesome for vectors. They have a new vector feature called knnbeta

    • @ParthSaneHD
      @ParthSaneHD Před 5 měsíci +2

      Pinecone works too!

    • @amdenis
      @amdenis Před 5 měsíci

      You are correct, but you know that! It’s indexing is not fast enough for many serious AI projects, and its single threaded architecture does not scale. Under the hood there are many other non-vector legacy issues.

  • @LindsayHiebert
    @LindsayHiebert Před 10 měsíci +13

    Excellent overview! Very concise, clear and relevant! Great job! Thank you Adrian! 😊

  • @nickkondoori7550
    @nickkondoori7550 Před 9 měsíci +16

    Incredible teaching skills. First time ever, I loved someone who can teach "ME" the way I always wanted. Thousand thumbs up Adrian!!

  • @krisograbek
    @krisograbek Před 10 měsíci +8

    Adrian, your channel is a gem! I love the way you explain complex topics and the pace of your videos! Greetings from Poland!

  • @coinexponent1884
    @coinexponent1884 Před 9 měsíci +5

    Learn vector embeddings using first principles. Always engaging, and very rewarding for the learner. Thank you!

  • @photorealm
    @photorealm Před měsícem

    That was the first video that actually gave me a understanding of how vector DB's kind of work. Thank you for sharing.

  • @phil97n
    @phil97n Před 8 měsíci +2

    Awesome thanks.
    Been studying calculus and linear algebra before I dive deep into AI. I will definitely be dealing with vector databases very soon and looking forward to it.

  • @TheOGDesigner
    @TheOGDesigner Před 10 měsíci +3

    Amazing tutorial! The way you explain is so easy and understandable!

  • @rasmuspiirtola4397
    @rasmuspiirtola4397 Před 7 měsíci +1

    Rarely comment, but damn, you did a perfect job - I am at 8:01, haven't watched the video but had to pause and comment - until 8:01, everything was perfect; how you explain concepts and utilize tools ensures that we understand the concept in practice with ease! Great job, continue making videos; you should do consulting if you don't already do so. It's easy money with little hours with your skills and knowledge!

  • @JeremyArtero
    @JeremyArtero Před 8 měsíci

    This course is gold! Thanks! I have done similar steps on Astra db and it was smooth

  • @saik6730
    @saik6730 Před 7 měsíci

    Best AI video ever . Made it easy to understand with 2 simple concepts . Thanks man!

  • @daygo619ca
    @daygo619ca Před 9 měsíci

    This tutorial was incredible - completely glued to it

  • @RajShekarsdreamzzz
    @RajShekarsdreamzzz Před 9 měsíci

    Very Good session Adrain... your way of teaching is keeping the people glued... Keep it up

  • @aiadvantage
    @aiadvantage Před 11 měsíci +16

    Super high quality video right here. Good job Adrian

    • @AdrianTwarog
      @AdrianTwarog  Před 11 měsíci +3

      Hey I've seen your stuff too, it's great, thanks for the nice words!

  • @LeonardoCarreraBaude
    @LeonardoCarreraBaude Před 9 měsíci +2

    I just bought 2 Udemy courses, and after 5 hours, none of them talk so well about this. I appreciate it, and I will buy your book. Thanks for your content.

  • @AmanBansil
    @AmanBansil Před 3 měsíci

    Absolutely LOVE this. you're so clear and concise.

  • @brookster7772
    @brookster7772 Před 6 měsíci

    Bare metal, removing all higher level obstructions going right down to the core. I love it the best understanding of what embedding’s earlier that I have seen great job.

  • @rkjellbe
    @rkjellbe Před 9 měsíci

    Finally, found a video with the appropriate detail. For me! 😊 Thank you!

  • @anrk97
    @anrk97 Před 11 měsíci +3

    Love your thumbnails. Keeps getting better with each video 👍

    • @AdrianTwarog
      @AdrianTwarog  Před 11 měsíci +2

      Thanks, I try to make them as clear to what they video represents as possible!

  • @kfliden
    @kfliden Před měsícem

    Wow, thanks I'm finally starting to get embeddings!

  • @dipayanroy964
    @dipayanroy964 Před měsícem

    I wish everyone could have presented like you, simply Super. Looking forward for more in similar way

  • @adamduvick
    @adamduvick Před 9 měsíci +6

    Let me see if I understand what’s going on here:
    1) you have data you want to search semantically
    2) you create a vector database capable of storing & querying data semantic search queries
    3) you use OpenAI to process your data & convert it to vectors which can stored in your database
    4) you store the data along with the OpenAI generated vectors
    5) now you can search the data
    Is that all it is? I thought you were then going leverage this database to give chatgpt “long term memory” ( 0:20 ). What you’ve showed seems nice, but I don’t really see the point since most people/companies who have enough data that would need to be queried in this way would not be able to give it away to OpenAI to process.

  • @abijithpradeep7478
    @abijithpradeep7478 Před 9 měsíci +2

    For those who already had an OpenAi account and you are facing an error while posting the HTTP request, its because your free credit has expired. You will have to add a payment method or createa new account to get free credits agin and then everything will work fine according to this tutorial.

  • @ZaidKhanPathan
    @ZaidKhanPathan Před 7 měsíci

    Wow! Easy, clear and to the point.

  • @sunnysk43
    @sunnysk43 Před 6 měsíci

    Absolutely amazing! Thank you so much for your work!

  • @satanrasool1802
    @satanrasool1802 Před 10 měsíci

    Love it.. it was far simpler than I thought..

  • @ravindrasingh2411
    @ravindrasingh2411 Před 3 měsíci

    Adrian, this is beautifully explained. Absolutely loved it :)

  • @andrey20111988
    @andrey20111988 Před 4 hodinami

    Also you can use in postman "Test", which can help you create a script to create a string with requested input and response data. Automate it! (If you need)

  • @robertcormia7970
    @robertcormia7970 Před 5 měsíci

    Well done, succinct, and excellent explainations of complex topics.

  • @nadershalabi6241
    @nadershalabi6241 Před 2 dny

    Thank you! Great walk through

  • @cmdrls212
    @cmdrls212 Před 7 dny

    This is great. I had to learn this in a crunch and I grok it now.

  • @CodexCommunity
    @CodexCommunity Před 10 měsíci

    This is the best video on openai embeddings I have ever seen, I am also a bit biased!

  • @chrislannon
    @chrislannon Před 3 měsíci

    Nice work! Thanks so much for this awesome demo.

  • @codinginflow
    @codinginflow Před 7 měsíci

    This was a great overview Adrian!

  • @sany2k8
    @sany2k8 Před 10 měsíci

    Great content 👍👍👍, waiting for more OpenAI, AI related content

  • @karthikg752
    @karthikg752 Před 9 měsíci

    The voice recording and explanation is really clear - surprising how tone and voice plays a major role in understanding. Was watching another video which was equally good but somehow the slang and recording made it a bit difficult to understand. Thanks

  • @Glow0110
    @Glow0110 Před 7 měsíci +6

    Would be great to see a follow up video of practical applications using this.

    • @atursams5501
      @atursams5501 Před 6 měsíci

      The practical application are varied:
      sentiment analysis
      term search
      Classification

  • @demetriusmds
    @demetriusmds Před 9 měsíci

    Excellent. Thank you. Helped a lot.

  • @karsonkalt7607
    @karsonkalt7607 Před 8 měsíci

    Fantastic tutorial and explanation!!

  • @curtisblake261
    @curtisblake261 Před 5 měsíci +1

    I like this video and I don't mind all the upselling. My only complaint is that if I pause the video for too long, it automatically sends me to another video in the series, which makes it hard to get back to where I was. You might assume it is user error, but it isn't. The automatic transferal and loss of context happens constantly with this CZcams video, and I've never had the problem with any other CZcams tutorial. I'm fine with the monetizing and upselling since it helps reward the content creator, I just wish it wouldn't keep making me lose my place in the tutorial.

  • @adavis912
    @adavis912 Před 4 měsíci

    Great tutorial!!! I will be buying your book.

  • @alexsalgado
    @alexsalgado Před 10 měsíci

    Excellent content, what changes for audio search?

  • @nrusimha11
    @nrusimha11 Před 10 měsíci

    Crisp and to the point, thank you. Can I ask how you made the slides like the one at 0:52?

  • @pajisounds
    @pajisounds Před 11 měsíci +4

    Nice video, it would have been nice with a demonstration at the end or intro, keep up the good work.

    • @AdrianTwarog
      @AdrianTwarog  Před 11 měsíci

      Oh good suggestion, I’ll do that next time!!

  • @rafaelmartinsdecastro7641
    @rafaelmartinsdecastro7641 Před 10 měsíci

    This is great stuff, thanks.

  • @user-uw6ld2dp9z
    @user-uw6ld2dp9z Před 6 měsíci

    Perfect explaination!

  • @FahadKiani1
    @FahadKiani1 Před 9 měsíci +4

    Will you create a second part of this video where PDF's are uploaded and then analyzed?

  • @user-qg5bo9bd5x
    @user-qg5bo9bd5x Před 5 měsíci

    great explanation ! thanks !!

  • @SarthakAgarwal-sm1gp
    @SarthakAgarwal-sm1gp Před 2 měsíci

    Awesome Content! Thanks

  • @PeterAdiSaputro
    @PeterAdiSaputro Před 10 měsíci

    In the past, I learned Support Vector Machines for doing classification. At that time, I struggled to learn the concept, although I finally was able to implement it into a program using codes made by another party. The introduction of this video suddenly revived the memory and helped me better understand the concept of SVMs that I learned years ago.
    Is Postman completely free and can be used without any restrictions or limitations ? Is Single Store also completely free without any restrictions or limitations ?

  • @omangramoswaane2211
    @omangramoswaane2211 Před 10 měsíci

    Nice video. I love your work.

  • @mohammadbarzegari8737
    @mohammadbarzegari8737 Před 6 měsíci

    Perfect learning ❤🎉 master of learning ❤❤❤❤

  • @Art-kz6zf
    @Art-kz6zf Před 7 měsíci +1

    How efficient is the vector search if you need to go through all of the records every time you search? Shouldn't there be some dedicated field type for embeddings other than blob?

  • @Kevin509wisdom
    @Kevin509wisdom Před 9 měsíci

    fantastic job!

  • @matickovac
    @matickovac Před měsícem

    Great work presenting this!
    Do you happen to know how similar or different this is from what Elasticsearch does when performing full-text search?

  • @joostschuur
    @joostschuur Před 10 měsíci

    How would I go about weighing the results by other meta data? Say I have a bunch of videos, and I'm searching the title/description, but want to give some amount of preference to newer videos too.

  • @atursams5501
    @atursams5501 Před 6 měsíci

    Great work! How do you make these nice presentations with the fancy arrows?

  • @defaultdefault812
    @defaultdefault812 Před 8 měsíci +4

    Well done, Langchain already exists...

  • @pablochacon7641
    @pablochacon7641 Před 10 měsíci

    Very interesting video, but what are the prerequisites to understand & actually implement this ?

  • @edoson01
    @edoson01 Před 10 měsíci +1

    Blows my mind you've spent 18m talking about the How and 30sec about the why and what.

  • @joergmayer3741
    @joergmayer3741 Před 10 měsíci

    Great video. THX

  • @corejava5730
    @corejava5730 Před 9 měsíci

    Very well explained, Thanks Adrian !! I have astaffing firm and I have a database of more than a million resumes. I m planning to create a resume search application for my recruiters. Do you think I should be using combination of Embeddings and Vector Database for above use case.

  • @shpop1
    @shpop1 Před 8 měsíci

    Great video ! precise..

  • @sivakumarkalaiselvan6831
    @sivakumarkalaiselvan6831 Před 6 měsíci

    Hi Bro,
    What is the extension u used in the vs code for the code suggestions?

  • @Ricocase
    @Ricocase Před 9 měsíci

    Cool course. How does one connect it to a basic website?

  • @xspydazx
    @xspydazx Před měsícem

    yes but how do you save a vector store ? ie export it to json for upload or finetuning into the main lm ?

  • @parkerrex
    @parkerrex Před 8 měsíci

    Great video !!! :)

  • @user-vm2er4qj9w
    @user-vm2er4qj9w Před 8 měsíci

    thanks for this video video is really helpfull

  • @BryanChance
    @BryanChance Před 5 měsíci

    Does the chuck size have an affect on the quality or accuracy of the search result? Let's say I split a document into words AND in 200 word chucks. The vector results are stored in a vector db.

  • @ewhite_dipi
    @ewhite_dipi Před 8 měsíci

    what are the prerequisites to understand the content in this video? And where can I learn them?

  • @zibitappert
    @zibitappert Před 10 měsíci

    would it be possible to use this for an AI NPC for training purposes in XR space for example?

  • @GenZManhood
    @GenZManhood Před 10 měsíci +2

    I get this message when I run the API. Do you need to pay OpenAI for it to work? Thanks! "error": {"message": "You exceeded your current quota, please check your plan and billing details.",

  • @psyduck4763
    @psyduck4763 Před 6 měsíci

    Hey man, what are those fonts you've used in this video?

  • @MannyBernabe
    @MannyBernabe Před 3 měsíci

    excellent. thx!

  • @KlrStng
    @KlrStng Před 10 měsíci

    Does your book explain what kind of database that Single Store is using, and what their proprietary function "json_array_pack" is actually doing? I would like to try this out but don't want to use a service like Single Store as I prefer to just use an AWS RDS database instead (or anything that I can have full control over, or even host myself), but that function is not a real SQL function and I don't understand exactly what is going on in the background here. It looks like the Single Store function is doing some kind of encrypting of the json array data, but it's unclear in their documentation. On a different note, is converting text to vector data more efficient and more accurate for searching than something like a simple SQL "like" query, or fulltext searching? How about compared to something like SphinxSearch or AWS Elastisearch?

  • @balajidasari9114
    @balajidasari9114 Před 3 měsíci

    This is amazing

  • @contactbhasker7483
    @contactbhasker7483 Před 9 měsíci

    dot_product is a function offered by this database for vector searching, ranking etc.. ?

  • @madhavanand756
    @madhavanand756 Před 10 měsíci

    Excellence & Awesome

  • @user-wf2ok2tq1z
    @user-wf2ok2tq1z Před 8 měsíci

    I'm a little confused.. If I created embeddings and which I'm assuming is essentially training the openai model on a specific topic for my company. Would it be able to answer questions only on the specific topic it was trained for?

  • @mahanteshg609
    @mahanteshg609 Před 3 měsíci

    Loved it

  • @KaptainLuis
    @KaptainLuis Před 4 měsíci

    thank you!

  • @FalahgsGate
    @FalahgsGate Před 10 měsíci

    excellent❤ work

  • @noubgaemer1044
    @noubgaemer1044 Před 4 měsíci

    thanks for the tutorial can we use our own LLM like private GPT or Text-generation Web UI instead of OPENAI

  • @chrismalingshu
    @chrismalingshu Před 9 měsíci +3

    [Question] When input hello earth, "Hello World" scored 0.89, meanwhile "OpenAI Vectors and Embeedings are Easy!" scored 0.74. Which is quite close to the top rank text. But syntactically first and second returned text are very different. Somehow I expect the second text might scored 0.5 and below.
    Could you please share your thoughts on this Adrian?
    Thank you!

    • @daffertube
      @daffertube Před 7 měsíci

      You would need to ask someone who built the transformers at openai.

  • @gman2036
    @gman2036 Před 3 měsíci

    Loved this tutorial Adrian, very straight forward and it worked the first time not like some others I've tried. Now for my question. I'm seeing this on February 2024. I did not know CHATGPT, BARD and those other AI apps until they hit the common pool that I must swim in. I take it that vectoring documents has been going on for awhile, outside of the math world. I knew of vectoring back from college in linear algebra. If this is the case, what I'm trying to do will not be new. I'm trying to vectorize my documents in order to practice doing this kind of work. So, are there IT companies out there doing this type of work already and can you name a few? How far have they gotten? Has someone already done the library of Congress for instance?

  • @veeresh5341
    @veeresh5341 Před 3 měsíci

    Thank you @Ardian

  • @diamond2869
    @diamond2869 Před 4 měsíci

    thank you so much

  • @MRGCProductions20996
    @MRGCProductions20996 Před 3 měsíci

    isnt calculating the modulus of the subtraction of the vectors a more accurate way to find similarities?

  • @cliffordmwale5782
    @cliffordmwale5782 Před 6 měsíci

    This is very useful. Could you also do embedings of CSV files? I have files amounting up to 5 million rows

  • @akshatkant1423
    @akshatkant1423 Před měsícem

    I am looking forward to generate a pretty lengthy json about 25k tokens, None of the llm models currently support that much output response tokens, do you think is it possible if i somehow get embeddings in response which later on i can convert to json then my aim to generate 25k tokens could be possible. Because embeddings will take lesser token size?

  • @DrAIScience
    @DrAIScience Před 4 měsíci

    Is there any way to obtain embeddings of gpts from images?

  • @meirgoldenberg5638
    @meirgoldenberg5638 Před 9 měsíci +1

    How in the world did it get 0.74 score (which pretty high on the scale for 0 to 1!) for the similarity of "Hello Earth" and "OpenAI vectors and embeddings are easy"? Is there anything in common between the two?

  • @fkxfkx
    @fkxfkx Před 11 měsíci +1

    Bought the book. It ended on page 54, is there anything after 54 to 58?
    Last example was open ai fine tuning.
    It leaves the ft up on open ai site.
    How long will it be available there?
    Can it be brought down locally and be used in the future as local in combination with cloud model?

    • @AdrianTwarog
      @AdrianTwarog  Před 10 měsíci

      I’ll double check, and any updates will automatically be enabled on Gumroad!

    • @Ricocase
      @Ricocase Před 9 měsíci

      ​@@AdrianTwaroghow to automate text importation with sql? Must one enter each text blob manually?

  • @Adam9174X
    @Adam9174X Před 10 měsíci

    I loved your video about ChatGPT Clone! Could you please make a new video to further improve the clone? For example, you could demonstrate how to make the website width responsive to work on all devices. Additionally, it would be great if you could show how to add some new features, such as a cool typewriter effect for the ChatGPT responses. Another useful addition would be to highlight the code generated by ChatGPT using a React syntax highlighter. This can be achieved by enclosing the code between the tags ``` and ```. I actually made a ChatGPT Clone and published it. . I added new features to my web app using your videos. Thank you very much!

  • @pazhani008
    @pazhani008 Před 2 měsíci

    how does SingleStore know the embeddings returned from OpenAI and searches it correctly in its vector db?

  • @KnowledgeKingVideos
    @KnowledgeKingVideos Před 5 měsíci

    The best!

  • @pranavkm4513
    @pranavkm4513 Před 11 měsíci +1

    wow great video sir. Helped a lot. may i know what extension is being used in 16:40 ?

  • @Kennethlumor
    @Kennethlumor Před 10 měsíci

    Sir please can you read create the boostrap navigation bar that have off-canvas customize with nice header logo and home