Build AI chatbot with custom knowledge base using OpenAI API and GPT Index

Sdílet
Vložit
  • čas přidán 18. 02. 2023
  • Learn how to build a full-stack app in this tutorial: • Full-stack AI chatbot ...
    Tutorial about building an AI with a custom knowledge base using OpenAI API, GPTIndex, and Langchain.
    The technique was first described by Dan Shipper www.lennysnewsletter.com/p/i-...
    Source code: colab.research.google.com/dri...
    High-converting accessible e-commerce template from Tech Foundation techfoundation.gumroad.com/l/...
    Designed following 100+ insights from large-scale usability testing and research provided by such organizations as Baymard University and NNGroup.
    Free UI design course: • How to set up Figma ac...
    #UXDesign #UX #Figma #UXUI

Komentáře • 294

  • @irina_nik
    @irina_nik  Před 11 měsíci +3

    Learn how to build a full-stack app in this tutorial: czcams.com/video/AMc2A5Abj3M/video.html

    • @Taskade
      @Taskade Před 10 měsíci

      Irina, amazing tutorial on integrating OpenAI API with a custom knowledge base! Really excited about the potential of GPTIndex and Langchain. I'd love to see a deep dive comparing AI Agents in Langchain, especially when they're long-running and autonomous. Keep up the fantastic work! 🌟

  • @pragmatica1032
    @pragmatica1032 Před rokem +9

    So happy to have found you this morning! We need more designers that can code and explore AI possibilities like you do!

  • @lcruzintel
    @lcruzintel Před rokem +2

    You are the best at explaining things Irina!! Thank you for taking the time to putting this together.

  • @jonathandanemo
    @jonathandanemo Před rokem +1

    That was a great tutorial. And I like your approach to explaining why one should not be using only one long prompt etc.

  • @shanesteven4578
    @shanesteven4578 Před rokem +1

    Excellent tutorial, well presented and very clear. Thank you …. It works perfectly, unlike many so-called tutorials on YT about AI 😊

  • @chinamatt
    @chinamatt Před rokem +1

    Great work!! Really nice step by step explanation! By the way you can click the refresh button in the file explorer panel (2nd icon) to refresh the files so that they appear.

  • @keithinadhd6693
    @keithinadhd6693 Před rokem

    Thank you so much for this information. This is exactly the kind of thing I've looking for. Step by step tutorials for finetining your own AI. This is perfect.

  • @vverboX
    @vverboX Před rokem

    Miss Irina, thank you. After few days playing around you got me to the point. Merci!

  • @kermitec
    @kermitec Před rokem +10

    Thank you for the tutorial... also, to refresh the files details there is a "Refresh" button located just above the Files detail section. It's an icon of a folder with a circular arrow. This will refresh the section without needing to refresh the page.

  • @harel4u2
    @harel4u2 Před rokem +1

    Great explanation. Very explicit and clear instructions. Thank you very much for this.

  • @gianantonel9913
    @gianantonel9913 Před 11 měsíci

    Great video Irina !!
    I was looking for this exact solution and it was the first video of your channel that I followed exactly step by step and it works perfectly end to end
    It was very clear and well explained.
    Nice job !!
    Please continue making this kind of useful videos
    It was extremely useful for me and extremely detailed.
    Keep going!

  • @JulianHarris
    @JulianHarris Před rokem +4

    Using ChatGPT to generate sample user interview data: genius 💥

  • @njorogekamau3820
    @njorogekamau3820 Před rokem +1

    Thanks for the amazing tutorial, simple but impactful.

  • @YahaS-vf7cq
    @YahaS-vf7cq Před rokem +1

    Amazing video, very friendly to beginners. Thank you.

  • @rdy4trvl
    @rdy4trvl Před rokem +4

    Great video and thanks for answering many of the questions! Looking forward to your future YT on integrating into a website.

    • @irina_nik
      @irina_nik  Před rokem

      Thank you, I'm glad you liked it

    • @Inglewhite1
      @Inglewhite1 Před rokem

      @@irina_nik thank you for this video. Do you have a tutorial to show how to integrate it into website/whatsapp? thanks

  • @chatgpt_explained
    @chatgpt_explained Před rokem +1

    Thanks for this info - it's easier to setup a chatbot than I realized!

  • @borakou39
    @borakou39 Před rokem +1

    This is exactly what I was looking for, thank you!

  • @sammathew535
    @sammathew535 Před rokem +4

    You don't need to refresh the whole colab page to update the view of the files/folders, but just the refresh button above the directory structure, in the left pane.

  • @researchforumonline
    @researchforumonline Před rokem

    Nice, already done it but i don't know everything so had to watch this!

  • @malexandersalazar
    @malexandersalazar Před rokem

    I didn't know that we can do something like this with OpenAI, thanks for the video Irina.

  • @HelpHub150
    @HelpHub150 Před 11 měsíci

    thank you !!!! this is a great video Irina, keep up the good work !

  • @dannydiscovers
    @dannydiscovers Před rokem

    This is an incredible video. You did an amazing job. Subscribed

  • @somu6666
    @somu6666 Před 5 měsíci

    It's really nice, I got the insights how we can use the custom knowledge base

  • @hishamalawi6011
    @hishamalawi6011 Před rokem

    An excellent tutorial. Thank you.

    • @hishamalawi6011
      @hishamalawi6011 Před rokem

      I converted this code to a flask app and it works fine on my local server. However when I deploy to google app engine it fails to return responses. The error is 500 internal server error! any idea or advice is much appreciated.

  • @addkik
    @addkik Před rokem

    Very informative...Thanks 😀
    wishing you Lots of love and strength to you.

  • @thepunisher0702
    @thepunisher0702 Před rokem +1

    Great !! Keep Going. All the very best !!👍😄

    • @irina_nik
      @irina_nik  Před rokem +1

      Thank you!!! Your words inspire me for more videos)

  • @javi_v7.0
    @javi_v7.0 Před rokem

    Great video, thanks!!!

  • @maneeshk2355
    @maneeshk2355 Před 3 měsíci

    I love your teaching ❤

  • @Lexa-Live
    @Lexa-Live Před rokem +4

    Even I understood almost everything! Well delivered and interesting content!

  • @sojoba3521
    @sojoba3521 Před rokem +1

    Great tutorial! Thank you so much for going through this is such detail. Can you suggest a resource that explains how to take the chatbot we create and integrate it into a website or web app with a prettier interface?

  • @Kisssonik
    @Kisssonik Před rokem +1

    почему ты все время улыбаешься))) так мило)))

  • @diederik6975
    @diederik6975 Před rokem

    Thank you very much, very useful tutorial.
    Wondering, why did you not use gpt-3.5-turbo - as it is much more inexpensive and probably almost as good?

  • @CamsYoga
    @CamsYoga Před 11 měsíci

    Thanks worked for me 😇

  • @kawingchan
    @kawingchan Před rokem +1

    Thanks for posting this video. The whole demo is great. The only thing that I am not clear about how to pick those input, output sizes, and if some are based on the particular model, how do you obtain those from OpenAI (like the davinci) page, just in more details and a screen split such that you don’t have to toggle around.

  • @DrMohanMuthal
    @DrMohanMuthal Před rokem

    Great information irina❤🎉

  • @inflationking1271
    @inflationking1271 Před rokem +2

    Really good tutorial. I wonder on how well this scales with more documents than just a couple. Do you have some experience with the performance of 1k or 10k documents?

  • @sambhajisawant4559
    @sambhajisawant4559 Před rokem +1

    Thanks it’s really helpful. Capfuls you please let me know if I can use complex data having 100 of parameters (text & numbers) ? If yes in what format the should be uploaded?

  • @anastasiosmichaelkoutoumba9384

    Excellent

  • @EdSpooky
    @EdSpooky Před rokem

    Thank you so so so much

  • @0xeb-
    @0xeb- Před rokem

    Thank you Irina

  • @gangwu3235
    @gangwu3235 Před rokem +4

    Thanks for the amazing tutorial. BTW, is there any method to increase the output length? I could only get a answer of approximately 160 words (~250 tokens) right now.

  • @evaagustine7962
    @evaagustine7962 Před rokem +3

    Hi Irina! it is such a great tutorial and would be useful for case that I currently work on. I have tried this with my own research data and turns out so good with relevant and decent answer. But I am wondering is it possible to use the GPT 3 Model but not using it's training data or knowledge? So the information/answer produced would be just using custom data that we added to the knowledge base. Your answer would be very appreciated, thanks!

  • @zhiyingwang1234
    @zhiyingwang1234 Před rokem

    Thank you so much, Irina! I copied your source code to Jupyter notebook and create a chatbot in a few minutes! To my surprise, it works! Please give some thumb-ups to this amazing lady. She has spent time to make this solution so easy to use for everyone!

  • @CaboLabsHealthInformatics

    Nice!

  • @user-vc2sc9rq7t
    @user-vc2sc9rq7t Před rokem

    Thanks for the great tutorial! For multiple documents, can you please advise on how i can retrieve the file name where the contextual information is retrieved from?

  • @bartake1
    @bartake1 Před rokem

    Great tutorial. When we send data to OpenAI is that getting used for public training or would it remain private for me ?

  • @shacharlavi8556
    @shacharlavi8556 Před rokem

    great. tnx

  • @chuck18420
    @chuck18420 Před rokem +1

    What could be happening here? I asked how many people were interviewed and the reply was "One person was interviewed". I asked how many times did "It was fun to talk about cooking." appear and it said none (interview4 ends with this quote). Thank you, great video!

  • @user-on7gb7tf8p
    @user-on7gb7tf8p Před rokem +1

    Great work, it's quite clear, Seems the llama Index has many updates, I can't recreate your work, would you please make an updated version? thanks a lot~

  • @kunalr_ai
    @kunalr_ai Před rokem

    You nailed it ..I ll follow you on Twitter.

  • @NK5LLC
    @NK5LLC Před rokem +1

    This is great, thank you! When asking questions to the AI, I didn't notice any custom instructions in use. How can you be sure it was answering only using the data given to it in the index?
    Can you also make more videos for using custom data from other sources, such as databases? How about the ability to categorize?
    One minor thing: When pronouncing the word "answer", the "w" is actually silent. (My wife is ESL and always asks me to correct her pronunciation, and I ask the same of her when I speak her native tongue.)

  • @Shrab
    @Shrab Před rokem

    Great explination, thnka you, may I ask, Is there a limit on how much custom data you can use and would large custom knowledge slow down the chat?

  • @MichaelLloydAI
    @MichaelLloydAI Před rokem

    Irina,
    Many use cases. Excellent information. Thank you.
    Are you able to provide a similar method for creating a generative AI for a closed system that ensures secret or confidential company or government data cannot be leaked?

  • @veyselaytekin8734
    @veyselaytekin8734 Před rokem

    thank you

  • @OferNRaz
    @OferNRaz Před rokem

    Hi Irina, thanks for sharing. Do you know if, having a large training set, you can also ask statistical questions about the set? For example, if one of the questions you had in the interview was "How much did you pay for an air-fryer?", could you ask GPT a question like: "On average, how much people paid for an air-fryer?"? thanks

  • @BillyRybka
    @BillyRybka Před rokem +1

    Hey! Great video! Now that Chat GPT api is out do you know if these libraries will work for it? or is this still only a gpt 3.5 method?

    • @irina_nik
      @irina_nik  Před rokem +1

      Hi! This library is not available with ChatGPT yet, but you can keep an eye for updates here gpt-index.readthedocs.io/en/latest/how_to/custom_llms.html

  • @marassisportsinc.9195

    Nice 👍

  • @luciomagnenat8900
    @luciomagnenat8900 Před rokem

    Does anyone have an example of an existing bot builded with this method? I would really like to see the results because this video as much others, shows you how to do it but they never show the bot actually working!

  • @_trashcode
    @_trashcode Před rokem

    Thanks for this very helpful video. I am very happy to have found it. I would like to create my own knowledge database that includes ideas I've had in the past, transcripts from CZcams tutorials, manuals, etc. Some of this content is constantly changing. For example, a manual isn't updated very often, so I can create a database with it and leave it as is. However, what about ideas and the aforementioned transcripts of tutorials? It might not be very efficient to manually update these, so I would try to automatically update them on a schedule. Alternatively, would it make more sense to leave them out and just let the bot scan through all my ideas in normal text form, notifying me if I've had a similar idea before or if it finds connections between different ideas I've had? Any help is appreciated. Thank you.

  • @adambrickley1119
    @adambrickley1119 Před rokem

    How would you adapt this to derive context from dynamic data being generated in a website?

  • @p.c.336
    @p.c.336 Před rokem +2

    Congrats Irina very clear and nicely explained 👍Which file formats does it support for indexing? Is it only .txt?

    • @irina_nik
      @irina_nik  Před rokem +2

      Thanks! You can connect other file types with LlamaHub gpt-index.readthedocs.io/en/latest/how_to/data_connectors.html

  • @saw970
    @saw970 Před 9 měsíci

    Very nice and easy way thank you !!! I have a question regarding the custom knowledge base … can I implement a prolog knowledge base and put it there or it should be a text type because prolog is a requirement in my school project… I hope you answer and thanks a lot ❤

  • @austink9285
    @austink9285 Před rokem

    Irina, thank you for your help? When I ask it irrelevant article questions, it seems to many times provide answers, when it shouldn't. Anyway to ensure it only focuses on my uploaded article?

  • @RobertoSilvaZuniga
    @RobertoSilvaZuniga Před rokem

    Great example Irina! I was wondering if the cost of the OpenAi key is for each request? I mean every time you ask the OpenAi key will cost or only will have a cost when you create the dataset in the JSON file? another question, Do you know if the questions/requests will feed the JSON file or are only to work as an assistant using the JSON file as a base? Thanks!

  • @alexdomla
    @alexdomla Před rokem +1

    Thank you for the video! Really cool. I have a question: here you are working on Google Colaboración, but how would you bring this to a website? Is it possible? Is it easy? Greetings from Spain :)

  • @sumitsehgal5910
    @sumitsehgal5910 Před rokem

    Hi,
    Can anybody tell me where did she use the interview sample files in code?
    Overall it's fantastic

  • @MikeyMcCorry
    @MikeyMcCorry Před rokem +28

    Amazing tutorial! Thanks! If you're looking for future tutorial ideas, I'd love to know how to expand on this to create my own API endpoints so my trained chat bot can be made publicly available from my website. I'm not very familiar with Google Collab (or python for that matter - I'm a php/js web developer), so I'll try to do some of my own research on how this might be possible -- but I really enjoyed and easily absorbed the info in this video. Well done. :)

    • @irina_nik
      @irina_nik  Před rokem +21

      Hi Mikey! Thank you for the suggestion, I definitely need to make a video about that. I think, I'll be able to post it in 3-4 weeks. Though I'll be using NextJS/Typescript because this is what I'm familiar with.

    • @Adrian_Marmy
      @Adrian_Marmy Před rokem +3

      this response made me subscribe... That would be awesome!

    • @maertscisum
      @maertscisum Před rokem +2

      ​@@irina_nikyou are smart. Can't wait to see you share the typescript/node js version.

    • @lstephen
      @lstephen Před rokem

      Good question Mikey! I have the same question and subscribed to find out from her next video! Thank you!

  • @NatkhatNoble
    @NatkhatNoble Před rokem

    That smile, that damned smile 😊 And thanks for the nice tutorial btw.

  • @LisaButler-dy1ps
    @LisaButler-dy1ps Před rokem +1

    Thank you for sharing this walkthrough! It's exciting to think about the potential uses for this. At my company we have pretty tight digital security because we sometimes deal with personal identifying information. You mentioned that your actual research is also confidential. So I'm curious about the security risk of something like this. Do you have any concerns over the security of the data you are uploading?

    • @irina_nik
      @irina_nik  Před rokem +3

      Hi Lisa!
      The data submitted through OpenAI API is not used for training purposes and is deleted after 30 days. Here is the policy platform.openai.com/docs/data-usage-policies
      Though we are also still figuring out the security questions, while I'm experimenting with fake data.

    • @shitaldhakne7989
      @shitaldhakne7989 Před rokem

      Hi Lisa...! for data security ,you can use azure openai services.

  • @zhuk
    @zhuk Před rokem

    Пасибки 🥰

  • @lopnezk1320
    @lopnezk1320 Před rokem +78

    Thanks! Now I can fire all my employees and save lots of money!

    • @irina_nik
      @irina_nik  Před rokem +7

      😎

    • @BwahBwah
      @BwahBwah Před rokem +13

      🤣🤣.... 😅😅.... 😄😄.... 🙂.... 🤔🤔🤔... 😐😐

    • @unitedstarsutopia
      @unitedstarsutopia Před rokem

      Seriously 😂😂

    • @unitedstarsutopia
      @unitedstarsutopia Před rokem +2

      ​@@BwahBwahdon't tell me you are going to fire your employees too😂

    • @BwahBwah
      @BwahBwah Před rokem +2

      @@unitedstarsutopia I'll go one better. I won't have to employ anyone now 😀

  • @leoheise9967
    @leoheise9967 Před rokem

    hey, any tips on how to fine tune a model based on a very large pdf document without the "
    " to split prompt/resolution? I thought maybe have a script break down in every question mark? Or is there some other way?

  • @almor2445
    @almor2445 Před rokem

    Is there a good one already made, loaded with up to date research papers?

  • @jairam470
    @jairam470 Před rokem

    Hello, nice video. Please let me know how this will ensure our data still will be our data. Will OpenAI won't have access to it now ?

  • @youwang9156
    @youwang9156 Před rokem +1

    really appreciate your work, just have one question for the chunk, can I split the text into chunks by sentence or comma or space instead of chunk size?

    • @irina_nik
      @irina_nik  Před rokem

      Hi! In this technique it's not possible

    • @youwang9156
      @youwang9156 Před rokem

      @@irina_nik I found this technique is not very suitable for dealing with numbers, for example,it will delivers wrong product price which is mixed up with other products, do u have any idea how to fix it ? Thank you so much for ur reply

  • @user-tg9ft2yj7n
    @user-tg9ft2yj7n Před rokem

    Hello! Thank you for the helpful tutorial! What would happen if I ask a question in another language? Would this chatbot switch to the language as ChatGPT does? Thanks a lot.

  • @DeepakSahu-ol5rv
    @DeepakSahu-ol5rv Před rokem

    Can you please provide UI for the same, it will be very helpful for me. I am stuck in the linking part.

  • @user-lz2md3pu5b
    @user-lz2md3pu5b Před 6 měsíci

    Hey Irina! Thank you for this tutorial, it's a game changer. This is built off GPT 3, how would you go about running it off GPT4? Thanks!

  • @M-ABDULLAH-AZIZ
    @M-ABDULLAH-AZIZ Před 11 měsíci

    having data in a file and real time embeddings vs embeddings in a db for chatbot for an application (provides information about an application)?

  • @callmefred
    @callmefred Před rokem

    Very nice tutorial. Have you developed any consumer-facing AI-powered web apps?

    • @irina_nik
      @irina_nik  Před rokem +3

      No, but I'm thinking to make a next tutorial about it once I figure it out :)

  • @tulsipatro4662
    @tulsipatro4662 Před rokem

    Amazing tutorial.
    Is there a way where we can let the model answer the questions faster! It takes nearly 30 seconds to answer the questions.

  • @AdiDubs
    @AdiDubs Před rokem

    Thank you so very much. A quick question - how can I get more detailed responses?

    • @irina_nik
      @irina_nik  Před rokem

      Increase the number of output tokens

  • @jdlovely
    @jdlovely Před rokem +1

    Great video! How do we customize it to our local repository as you suggested?

    • @irina_nik
      @irina_nik  Před rokem

      Hey! You can just manually upload the data into the project folder and change the path in construct_index("your_path")

  • @gabrielcastaing8035
    @gabrielcastaing8035 Před 7 měsíci

    Hi
    Thank you for that content!
    I am just curious about the files size limit and the importance of the file format in your approach. I have seen that you are using .txt files. I am using pdfs to feed the knowledge base of custom GPTs but I am observing a low accurary in the answers. It seems that the GPT is not looking at all the knowledge base (6 merged pdfs with 7000 pages approx. in total). Do you have any advice?

  • @phnxregen2131
    @phnxregen2131 Před rokem

    Does the data in the folder have to be a .txt file, or can I include pdfs and other documents?

  • @prabharora0
    @prabharora0 Před rokem

    Hello! Thank you for the video! Also your secret API key is visible in the first few frames before you blur it! You should delete that API key completely!

  • @123arskas
    @123arskas Před rokem +1

    One question, Whenever we ask a question......Does it go through the entire Index everytime? And does that cost us a lot of Tokens for each question? Because If that's the case then we would run out of credit if we applied an App like that for users online.

  • @TorNeely
    @TorNeely Před 11 měsíci

    Very intresting. I noticed you said that you can't share the real interviews in video because there can be private information, which is understandable. However, how do you secure that Open AI doesn't receive this information? I find the biggest problem is how to avoid Open AI getting either user or customer information?

  • @johnsmith1953x
    @johnsmith1953x Před rokem +1

    Is there a software package that can make an entire openai chatbox GPT4 or even 3.5 just by
    pointing at a folder of PDFs?
    We would pay thousands for this right now.
    The application has to run local on a PC.

    • @irina_nik
      @irina_nik  Před rokem +1

      You can use langchain for that. I'll make more tutorials on that topic

  • @reticent
    @reticent Před rokem +4

    Hi Irina -
    Would building a custom AI chatbot also allow you to avoid the topic restrictions put into place by OpenAI/Bing/etc. in their chat modes?
    Personally I'm interested in interacting with one of these AIs without all the restrictions put into place by the corps running them publicly, and if possible with persistent memory of previous conversations. The technology fascinates me but I don't want to interact with "gimped" AIs that, as entities, can't really exhibit their true capabilities due to the restrictive actions of the tech companies who are trying to reduce their exposure from a liability standpoint.

  • @konstantinlozev2272
    @konstantinlozev2272 Před rokem

    Can the indexing and query code be run locally (interfacing with GPT-3 over the internet of course)?
    What IDE?

  • @pedromoreno8655
    @pedromoreno8655 Před rokem

    Hi Irina, thanks for the video. I want to ask how do you limit the model to answer only about your information. I.e., what would happen if the person asks any question out of context (like: "Can I go to Miami for holiday?"), will it reply?.
    Thanks

  • @jackwan358
    @jackwan358 Před rokem

    If i put my own knowledge base to chatgpt for reference, is it just my api qill be able to access the content or it will also share the same to others chatgpt users? Is there any security comcern on this?

  • @basedblueboy8770
    @basedblueboy8770 Před rokem

    Can you set up to read Python code bases?

  • @ganeshkris
    @ganeshkris Před rokem +1

    This just spits out the text related to the query. If I want to augment GPT capabilities with my own data set. what is the best way to do it? For example, using the same example of interview transcription, I should be able to ask the GPT to summarize how the candidate did or whether the interviewee answer was correct for a particular question. Any idea how to go about that? I understand fine-tuning is a possibility but if i have 10,000 interview scripts i want to augment the GPT capabilities with, I am not sure how to go about it.
    Any help?

  • @athuldas8689
    @athuldas8689 Před rokem

    where did the answers come from chat gpt? or the data fed. When I checked the data, I could only find questions?

  • @vl9110012010
    @vl9110012010 Před rokem

    Благодарочка! нижайший поклон! Респект и уважуха)))

  • @wardaraees4887
    @wardaraees4887 Před rokem

    You feed text data files for providing the data to the model, what if I have an excel file or a tabular data file?
    And, Openai api key is free or it is paid?

  • @fatpen9731
    @fatpen9731 Před rokem

    Great tutorial, how to encourage it to generate more text from the data it trained on ?

    • @irina_nik
      @irina_nik  Před rokem

      Hi! Try to increase the number of output tokens.

    • @fatpen9731
      @fatpen9731 Před rokem

      @@irina_nik thank you Irina ^^

  • @tigerwee6721
    @tigerwee6721 Před rokem

    Where do you define the directory_path?