Meta Llama 3 Fine tuning, RAG, and Prompt Engineering for Drug Discovery

Sdílet
Vložit
  • čas přidán 24. 04. 2024
  • Large language models such as Meta's newly released Llama 3 have demonstrated state-of-the-art performance on standard benchmarks and real-world scenarios. (1) To further improve domain-specific generative AI answers, Fine-tuning on a different dataset, Prompt Engineering, and Retrieval Augmented Generation (RAG) are utilized to improve Llama 3 utility.
    For enhanced usability, Llama 3 text-generations may need additional modifications, provide additional context, or use a specialized vocabulary. Fine-tuning is the process of further training the original pre-trained Llama 3 using domain-specific dataset(s). Prompt engineering doesn’t involve re-training Llama 3, but is the process of "designing and refining the input given to a model to guide and influence the kind of output you want." RAG "combines prompt engineering with context retrieval from external data sources to improve the performance and relevance of LLMs." (2)
    The seminar will detail how to use Drug Discovery related datasets with the three LLM techniques mentioned above. The cover image depicts cancer drug candidate RTx-152 and residing Protein and DNA interactions, in Separate research. Fried, W., et al. Nature Communications. April 05, 2024. (A)
    1) Meta AI: ai.meta.com/blog/meta-llama-3/
    2) Deci AI: deci.ai/blog/fine-tuning-peft...
    A) Nature Communications: www.nature.com/articles/s4146...
    -CEO Kevin Kawchak
  • Věda a technologie

Komentáře • 10

  • @madmen1986
    @madmen1986 Před měsícem +1

    Please keep posting these technical videos. So many people are tired of beginner level knowledge. Whoever serves the community with relevant, advanced knowledge will prosper.

  • @meelanc1203
    @meelanc1203 Před 26 dny

    Thank you for sharing the video. As you mentioned, it would be helpful to have the links to the associated Jupyter notebooks. Could you please provide those in the video description?

  • @simonmasters3295
    @simonmasters3295 Před měsícem +1

    So what Kevin Kawchak is saying...(smile)... is that it doesn't matter much about which LLM model you use. All the LLM does is provide control of the conversation. What counts is a phase in which a bias is developed towards text-based current information which the developer or user provides to the LLM interface.
    Yyou cannot be assured that your data, or even the domain of interest, has been used to train the LLM model's response. Effectively, Retrieval Augmented Generation (RAG) patches in small, medium or large amounts of structured or unstructured data into to the AI environment and the AI provides answers accordingly using a vector database.
    This raises the question of how do we test accuracy? And that depends on whether the output is rigorously re-evaluated after the RAG process. I feel the need for a workflow...

  • @bamh1re318
    @bamh1re318 Před měsícem

    Some great drug discoveries are based on small experiments, e.g., Tamoxifen, Gleevec, Crizotinib & Vemurafenib etc. Is massive "fine-tuning" of LLM necessary, or counter-productive vs specific/narrow training?

    • @simonmasters3295
      @simonmasters3295 Před měsícem

      What do *you* think?

    • @bamh1re318
      @bamh1re318 Před měsícem

      @@simonmasters3295 In the case of Greevec, it was discovered on RTK-cell models. Its effect on bcr-abl is a "also-found". It's potential was revealed in one single assay with CML patient's bone marrow culture. The other 3 compounds are in similar situation. Med-Chem or pharmacology was not the bottle neck. Instead our bias or visions were

    • @bamh1re318
      @bamh1re318 Před 23 dny

      @@simonmasters3295 Target-/process-driven AI models, which grow with discovery/development progression could be easier to put into practice.

  • @generationgap416
    @generationgap416 Před měsícem

    This guy can make abc and 123 complex.

  • @KumR
    @KumR Před 22 dny

    10