Learn to Embed Data with ChatGPT to make a Recommendation Bot

Sdílet
Vložit
  • čas přidán 14. 06. 2024
  • Beginner friendly, step-by-step walkthrough of how to embed your own data, and make a working recommender bot. I explain each step and how embeddings differ from fine-tuning and general prompting, as well as how you can embed your own data and use this powerful tool! This example makes a movie recommendation from 3000 movies, just plug in your favorite movie and it will recommend a similar title based on the plot and movie title.
    Colab Notebbook:
    colab.research.google.com/dri...
    Excel Data:
    docs.google.com/spreadsheets/...
    Other Videos:
    AI Phone Scheduler - • Build Your Own AI Rece...
    AI for Automating Cold Calling - • Using ChatGPT to Autom...
    GPT that makes Calls - • Custom GPT with Bland ...
    Fine-Tune Llama-2 - • The Secret to Fine-Tun...
    Fine-Tune ChatGPT 3.5 - • Easily Fine Tune ChatG...
    Build an AI Texting Bot - • Fine-tuning ChatGPT to...
    Fine-Tune ChatGPT 3 - • Transform ChatGPT into...
    Embed Data with ChatGPT - • Learn to Embed Data wi...
    Build a chatbot - • Create your own AI cha...
    How I got GPT4 access - • Here's how I got appro...
    API keys - • How to Access OpenAI A...
    Chapters:
    0:00 - Intro
    0:52 - What is Embedding Data?
    3:16 - Set Up Excel Data
    3:56 - Setup Code in Colab
    7:45 - Embedding Function
    10:04 - Embedding Complete
    10:41 - View Embedding Data
    11:52 - Setup Search/Recommend Function
    14:45 - Gradio UI
    15:37 - Success! Let's Test It
    #embedding #openai #chatgpt #fine-tune #chatgpt #gpt #gpt4 #api #recommend #recommenderbot #tutorial #beginner #ai #artartificial intelligence #chat gpt #embed data #open ai #gpt-4 #python #learn ai #program #LLM
  • Věda a technologie

Komentáře • 30

  • @davidrose9577
    @davidrose9577 Před rokem +4

    Great info, great walk through and execution. I appreciate the fact that you write the code in large font as it's frequently difficult to see the code on your computer screen. Thank you for the extra effort.

  • @redbaron3555
    @redbaron3555 Před rokem +1

    Awesome videos! Thank you!

  • @caiyu538
    @caiyu538 Před 11 měsíci +1

    Great demo

  • @HamzaRashid
    @HamzaRashid Před 11 měsíci +1

    Thanks for this video and the fine tuning video. Can you help me understand - are these emeddings/fine tunings done at the API Key level? Or does it do it on model level that can be used with new API keys under the same account?

    • @tech-at-work
      @tech-at-work  Před 11 měsíci

      You can use the embedded/fine-tuned models with multiple API keys from the same account!

  • @alireaziat3842
    @alireaziat3842 Před 9 měsíci

    This is great thanks. I need to understand this: is the movie description the information that have been vectored here? I mean the system can measure similarity on a full length set of descriptions?

    • @tech-at-work
      @tech-at-work  Před 9 měsíci

      Correct, it’s assigning the vectors to the entire description!

  • @dearlove88
    @dearlove88 Před rokem

    So, my understanding, which I’m happy to be corrected, is that fine tuning doesn’t actually ‘add data’ to an LLM, unless you want to retrain the whole model. Vectoring/Embedding is pretty much the only financially viable option to insert data and get openAI to answer questions about said data.

    • @tech-at-work
      @tech-at-work  Před rokem +1

      You’re mostly correct; fine-tuning directly adjusts the existing models parameters but does not “add new data”. Embeddings allow the model to understand (context & sentiment) new data as an input, but also works with an existing model. Embeddings are much cheaper but less nuanced where fine-tuning is more accurate but requires better formatted data and more expensive to perform.

  • @yizhouqian5899
    @yizhouqian5899 Před 8 měsíci +1

    Is it possible to use embedding to build a lexicon that can be used to classify words/phrases/sentences into particular categories even if a word/phrase/sentence is not in the dictionary? Thank you so much for your tutorial. It is the most straightforward and organized video I have ever encountered.

    • @tech-at-work
      @tech-at-work  Před 8 měsíci +1

      It can as long as you have sufficient training data using the new words, and the associated classification.

    • @yizhouqian5899
      @yizhouqian5899 Před 8 měsíci

      @@tech-at-work Thanks for the feedback! Say classify college football experiences, how much data (annotated training data) would you consider sufficient? I am working on a project but having a hard time gauging the effort. Thanks again!

    • @tech-at-work
      @tech-at-work  Před 8 měsíci +1

      It will depend on the size of the lexicon you’re building, and different word combinations. You want examples and context for each word, and common nearby words, so a few examples per unique word should be sufficient (and word combinations)

    • @yizhouqian5899
      @yizhouqian5899 Před 8 měsíci

      @@tech-at-work Thank you, sir. I am still a little bit confused with fine-tuning vs. embedding. For college football experience classification, there are certain experiences that should be classified in one category not the other (e.g., "food is not good" should go to the concession quality category whereas "beverages are expensive" should belong to the concession pricing category). What I encountered was that if using by-default GPT-3.5, classifications were not done as anticipated (e.g., GPT-3.5 could not differentiate concession quality vs concession pricing). In this case, should I use embedding or fine-tuning to improve the quality of the output? Thank you again!

  • @bryancaro8625
    @bryancaro8625 Před 9 měsíci

    Great video.
    Im getting this error RetryError[] openai embedding, im trying to embed 5000, but works if its less. You know why?

  • @emiliostavrou
    @emiliostavrou Před 7 měsíci +1

    Do you think it would be possible to connect this to a live speadsheet?

    • @tech-at-work
      @tech-at-work  Před 7 měsíci +1

      You could have a live spreadsheet in Google Colab, but you'd need to re-run the embedding code each time to actually use it.

  • @nesun3
    @nesun3 Před rokem +1

    How to add closest top 5 recommendations?

    • @tech-at-work
      @tech-at-work  Před rokem +1

      You need to change n=6, and add a colon after the [1] to show all rows, then for Gradio you need it to output the list as string (sequence of characters). To get Top 5, adjust your last 2x sections of code to this, and it should work;
      def search_movies(df, movie_title, n=6):
      embedding = get_embedding(movie_title, engine='text-embedding-ada-002')
      df['similarities'] = df.Embedding.apply(lambda x: cosine_similarity([x], [embedding]))
      res = df.sort_values('similarities', ascending=False).head(n)
      return res.iloc[1:]['Movie'].tolist(), res.iloc[1:]['similarities'].tolist()
      def gradio_wrapper(movie_title):
      top_movies, similarity_scores = search_movies(df, movie_title)
      top_movies_str = '
      '.join(map(str, top_movies))
      similarity_scores_str = '
      '.join(map(str, similarity_scores))
      return top_movies_str, similarity_scores_str
      top_movie_str = ', '.join(map(str, top_movie))
      similarity_score_str = ', '.join(map(str, similarity_score))
      return top_movie_str, similarity_score_str
      iface = gr.Interface(
      gradio_wrapper,
      inputs="text",
      outputs=[gr.outputs.Textbox(label="Top Movies"),
      gr.outputs.Textbox(label="Similarity Scores")],
      interpretation="default",
      )
      iface.launch(share=True)

  • @caiyu538
    @caiyu538 Před 11 měsíci +1

    Openai directly tokenize for us in your code?

    • @tech-at-work
      @tech-at-work  Před 11 měsíci

      You can add code to tokenize, but I didn’t include it in this example

  • @sergun4703
    @sergun4703 Před 8 měsíci

    Hi, I have tried to run your code but faces an error on pip install step.
    1)pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    lida 0.0.10 requires kaleido, which is not installed.
    tensorflow 2.13.0 requires typing-extensions=3.6.6, but you have typing-extensions 4.8.0 which is incompatible.
    2) pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
    lida 0.0.10 requires kaleido, which is not installed.
    tensorflow 2.13.0 requires typing-extensions=3.6.6, but you have typing-extensions 4.8.0 which is incompatible.
    Do you have any Idea How to fix that?

  • @user-kt6uk4tm6l
    @user-kt6uk4tm6l Před 7 měsíci

    I wasn't able to download the excel file 🥺

    • @tech-at-work
      @tech-at-work  Před 7 měsíci +1

      I updated the link, let me know if you have any issues now!

    • @user-kt6uk4tm6l
      @user-kt6uk4tm6l Před 7 měsíci +1

      @@tech-at-work Thanks so much, it works.

  • @hanabimock5193
    @hanabimock5193 Před rokem

    Private data on ChatGPT 🚩🚩🚩🚩

    • @tech-at-work
      @tech-at-work  Před rokem

      Fair point, make sure you're comfortable/allowed to share the data with OpenAI