Deepseek Coder vs CodeLlama vs Claude vs OpenAI

Sdílet
Vložit
  • čas přidán 12. 06. 2024
  • Trelis.com - for inference, fine-tuning and function-calling scripts.
    * Deepseek Inference *
    One-click template: runpod.io/gsc?template=51tpe9...
    TheBloke AWQ: huggingface.co/TheBloke/deeps...
    * Deepseek 1.3B, 6.7B and 33B function-calling models *
    huggingface.co/Trelis/deepsee...
    * Inference Guide *
    trelis.com/enterprise-server-...
    * Fine-tuning Scripts *
    trelis.com/advanced-fine-tuni...
    Full Repo Includes:
    - LLM Comparison Notebook from this video
    - Supervised fine-tuning
    - Unsupervised fine-tuning
    - Quantization scripts
    - Function calling / Structured response fine-tuning
    - Embeddings notebook
    OR buy only the LLM Comparison Notebook here: buy.stripe.com/5kAcNy8G12Hxg7...
    Chapters:
    0:00 Deepseek coder
    0:24 Agenda
    1:03 Model sizes and license
    2:02 Prompt format
    3:36 Inference on Runpod
    5:05 Performance vs CodeLlama, OpenAI, Claude
    7:04 Returning a sequence in reverse
    10:40 Passkey retrieval
    16:23 Website generation
    24:20 Function calling
    25:22 Resources
  • Věda a technologie

Komentáře • 12

  • @enzocalzone5298
    @enzocalzone5298 Před 4 měsíci

    Awesome! Thanks for the template

  • @vishalgoklani
    @vishalgoklani Před 7 měsíci +2

    I enjoyed the video, thanks for sharing. I am curious, do you think the NF4 quantization tripped up some of your results. How are the results different when using AWQ? What running the full model using bfloat16

    • @TrelisResearch
      @TrelisResearch  Před 7 měsíci +1

      NF4 probably has some effect, although I used it for both models so perhaps the relative effect is similar and the comparison is ok.
      AWQ typically performance a bit worse than NF4, but better than GPTQ. I think I discuss that a little in the awq video.
      Running bfloat16 is def best, you can check out this paper for relative performance: arxiv.org/pdf/2305.14314.pdf

  • @romanweilguny3415
    @romanweilguny3415 Před 7 měsíci +1

    interesting insights - thk you" - atm it seems that those big models are still very weak at certain tasks that seem to be rather simple. I tried getting out some info from table like info in the prompt from gpt-4 and it failed mostly even if the prompt is not long. this diappointed me hard as I really like the performance of gtp4 in many other tasks

  • @othmanaljbory3649
    @othmanaljbory3649 Před 7 měsíci +1

    Can you help me solve the two level programming model with constraints

  • @MW-ez1mw
    @MW-ez1mw Před 3 měsíci

    Hi @Trelis Research, thank you for the great video, will you consider to make a video to discuss how to fine tune copilot style coding llms? Thanks

    • @TrelisResearch
      @TrelisResearch  Před 3 měsíci

      Could you give more of an example of a specific fine tune that would be helpful

  • @DreamingConcepts
    @DreamingConcepts Před 6 měsíci +1

    9:54 here, gpt-3.5 actually failed, it skipped the "n"

    • @TrelisResearch
      @TrelisResearch  Před 6 měsíci

      oops, you're right. My brain is clearly a bad tool for grading