Merge LLMs to Make Best Performing AI Model

Maya Akim

zhlédnutí 39 016

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 1. 06. 2024
This video is about mergekit, how to choose and blend models. It's non technical but links to technical papers are included. You need to know how to navigate the terminal but no programming is required.
🤖 Join my Discord community: / discord
📰 My tutorials on Medium: / mayaakim
🐦 My twitter profile: / maya_akim
To rent a GPU from Massed Compute (mergekit preinstalled) follow the link ⤵️
bit.ly/maya-akim
Code for 50% discount: MayaAkim
All links:
mergekit:
github.com/arcee-ai/mergekit
Open LLM Leaderboard
huggingface.co/spaces/Hugging...
my huggingface profile (with model configs you can copy):
huggingface.co/mayacinka
git installation:
gitforwindows.org/
lfs installation:
docs.github.com/en/repositori...
supported architecture for mergekit:
github.com/arcee-ai/mergekit/...
best blog about mergekit:
/ merge-large-language-m...
other really good blog about mergekit:
/ merge-large-language-m...
Charles Goddard’s blog: (author of mergekit)
goddard.blog/about/
Mona lisa with Mohawk
www.designboom.com/technology...
What is YAML:
www.techtarget.com/searchitop...
What is Data Contamination:
bdtechtalks.com/2023/07/17/ll...
Goodharts law
www.cna.org/reports/2022/09/g...
LazyMergekit:
colab.research.google.com/dri...
Auto evaluation: (requires runpod profile)
colab.research.google.com/dri...
configuration with 14 models merged:
huggingface.co/EmbeddedLLM/Mi...
MoE instructions:
github.com/arcee-ai/mergekit/...
higher density - better results
github.com/arcee-ai/mergekit/...
Model family tree: (visualization)
colab.research.google.com/dri...
huggingface.co/spaces/mlabonn...
cost of training mistral:
www.ft.com/content/387eeeab-1...
Leaderboard is disgusting:
/ open_llm_leaderboard_i...
Merging models with different architectures:
arxiv.org/pdf/2401.10491.pdf
merging models different arch:
github.com/18907305772/FuseLLM
Blending is all you need:
arxiv.org/pdf/2401.02994.pdf
Model soups
arxiv.org/pdf/2203.05482.pdf
Ties-merging research paper:
arxiv.org/pdf/2306.01708.pdf
Dare merge research paper:
arxiv.org/pdf/2311.03099.pdf
Task arithemtic:
arxiv.org/pdf/2212.04089.pdf
Benchmarks
Arc benchmarks
deepgram.com/learn/arc-llm-be...
arxiv.org/pdf/1803.05457.pdf
HellaSwag
arxiv.org/pdf/1905.07830.pdf
MMLU
arxiv.org/pdf/2009.03300.pdf
TrithfulQA
arxiv.org/abs/2109.07958
WinoGrande
arxiv.org/pdf/1907.10641.pdf
GSM8K
arxiv.org/pdf/2110.14168.pdf
overfitting problem Ann Lotz:
arstechnica.com/tech-policy/2...
Benchmarks are a problem screenshots:
analyticsindiamag.com/the-pro...
/ llm_benchmarks_are_bro...
/ llm_benchmarks_are_bul...
Attributions:
[commons.wikimedia.org/wiki/Fi...](commons.wikimedia.org/wiki/Fi...)
Timecodes:
0:00 - 1:47 - blending intro
1:48 - 3:36 - promise of blending
3:37 - 4:22 - blending steps and requirements
4:23 - 5:05 - all you need is hardware
5:06 - 5:30 - mergekit installation
5:31 - 9:23 - merge methods
10:48 - 13:31 - configurations and yaml
13:32 - 14:38 - how to run merge
14:39 - 14:42 - upload merged model
14:43 - 16:27 - best merge method
16:28 - 20:16 benchmark problems, overfitting and contamination
#mergekit #llm #localmodels
Věda a technologie

Komentáře • 100

@maya-akim Před 2 měsíci ⁺⁵⁸
i hope you find the video useful and don't forget to show (and brag about) your blended models!
@TheEarlVix Před 2 měsíci ⁺¹
Thank you. Found this from your post on X.
@chrisBruner Před 2 měsíci ⁺¹
Your videos are always very good and cutting edge.
@qiqqaqwerty1713 Před 2 měsíci ⁺²
Thanks for the very informative video.
Cheers from "Down Under"!
@user-ds2sc9tn4x Před 2 měsíci ⁺¹
videos are so great! i will be modest to learn as more as i can!
@Ludecan Před 2 měsíci ⁺¹⁰
Awesome video!! Been following your series on building AI agents and they're very good! Thanks for sharing!
@Santiino Před 2 měsíci ⁺¹⁶
Its mindblowing to me how good your videos are yet you are still so unknown. Keep it up!
@blim420 Před 2 měsíci ⁺³
Excellent walk through, thanks !
@Rob_Steele Před 2 měsíci ⁺³
Great video Maya! Keep em coming! 😎
@SebastianKreutzberger Před 2 měsíci ⁺³
Fantastic video, so well prepared, fool-proof explained, and a really cutting-edge topic. Best AI CZcamsr out there - thank you 🙏
@sergiofigueiredo1987 Před 2 měsíci ⁺²
This is destined to evolve into a meticulously curated, go-to channel of human reliability for years to come. Thank you very much for the exceptional quality you provide!
@ivandukic Před 2 měsíci ⁺¹
Wow, what an incredible explaination of merge methods. Thank you.
@mysticaltech Před měsícem
Maya, you are good at this stuff. you are averaging my internal mind vectors to make Ai easy. Keep doing so!
@mayorc Před 2 měsíci ⁺¹
Great video Maya. Keep it up ❕❕❕
@Nifty-Stuff Před měsícem ⁺²
Blending LLMs is a fascinating idea. The idea left me wondering: Why hasn't anybody developed a system/app that takes the API's from the top LLMs, created agents for each, and then have these agents all work together to brainstorm, debate, review, and solve problems? I often get 4 different answers from 4 LLMs, so why not have them all setup as agents "in one room" working together to come up with the "best" solution. I can't find anybody that's tried this... why not? Wouldn't having the "top minds" (LLMs) working together produce better results?
@bamh1re318 Před 9 dny
they could become worse, or give you 4 different answers, or could not stop talking around themsaelves
@minae1423 Před 2 měsíci
well articulated and educational video, thank you Maya!🙏🏼
@seanhynes9516 Před měsícem ⁺¹
Awesome, thanks for the gerat video. Very well explained, great diagrams! :)
@overcuriousity Před 2 měsíci
interesting, easy to follow, well researched and critically scrutinized the results. like your content!
@ajay--yadav Před 2 měsíci
lot of information about so many topics presented nicely.
@robinmordasiewicz Před 2 měsíci
wow, most sophisticated CZcamsr ever. New favorite channel.
@GetzAI Před 2 měsíci ⁺³
Thanks Maya!
@rein436 Před 2 měsíci ⁺¹
Very insightful 👍
@scienceandmind3065 Před 2 měsíci ⁺²
Great video and exactly what I need at the moment. Having a lot of specialized models for science, translation, coding, finance etc but no good way of combining them.
@maya-akim Před 2 měsíci
best of luck! and share with us your results if you want :)
@gerykis Před 2 měsíci
Very good explanation. I'm looking for such easy to understand video how to fine tune a model locally .
@MiguelLopez-mu1ss Před 2 měsíci
Thank you for the insights
@jeremybristol4374 Před 2 měsíci ⁺⁴
Love the props with storytelling! Great instructional video!
@doomstertech8305 Před 2 měsíci
great video, loved the explanation of all the technical stuff. Would love to know your process on how you read and understand these topics in-depth?
@ulrichbeutenmuller8101 Před 2 měsíci
thanks, great video!
@johnefan Před 2 měsíci
Great Video👏🏻
@synchro-dentally1965 Před 2 měsíci
Excellent video! The development outlook seems open to so many possibilities. I'm curious if anyone will find advantages in networks built via diffusions(similar to image generation) or if there will be more real time dynamics implemented as the model responds to a query.
@qiqqaqwerty1713 Před 2 měsíci ⁺³
🎯 Key Takeaways for quick navigation, however this summary does not avoids you watch the complete video for a more in deep understanding::
Main Ideas:
- 🌍 Model blending is an innovative approach to surpass the performance of high-cost models with limited resources.
- 🤖 Non-experts can effectively blend models, demonstrating the technique's accessibility.
- 💡 The blend allows for specialized functionality, combining models tuned for diverse tasks into a powerhouse model.
- 🛠 The merging process involves selecting compatible models, defining parameters, and executing the blend with basic command line knowledge.
- 🔄 Various blending methods like task vector arithmetic and SLURP offer unique advantages for custom model creation.
- 📚 Proper selection and preparation of models are crucial, with a focus on architecture compatibility and avoiding common pitfalls.
- 🏆 Blended models can achieve top rankings on leaderboards, though their position may fluctuate.
- 🤔 The effectiveness of benchmarks in evaluating model intelligence is questioned, highlighting the issue of data contamination.
Takeways:
00:00 *🤖 Introduction to Model Blending*
- Introduction to the concepts of model blending, showcasing the power of combining models to overcome resource limitations and improve performance.
- Highlights two models, Mixol and Ramonda, emphasizing the potential of model blending even with limited resources.
01:24 *📘 Basics of Model Blending*
- Detailed explanation of model blending, its significance, and the methodology behind efficient blending.
- Discusses the blending process, the importance of model selection, and the steps involved in creating a blended model.
02:05 *💡 The Promise of Blending*
- Explores the potential of blending models to create top-performing LLMs without the need for extensive resources.
- Focus on the accessibility of fine-tuning and blending for personalized model development.
03:33 *🛠️ How to Blend Models*
- Provides a practical guide on blending models using MergeKit, including setup and execution steps.
- Emphasizes the ease of blending models with basic knowledge and the right tools, offering an approachable method for enthusiasts and professionals alike.
05:33 *🧪 Detailed Blending Methods*
- Deep dive into various blending techniques such as task vector arithmetic, SLURP, TIES, and DARE, explaining their unique applications and benefits.
- Discusses the technical aspects of model blending, offering insights into choosing the right method for specific goals.
08:17 *🖥️ Preparing for Blending*
- Guidelines on selecting compatible models for blending, emphasizing the importance of architecture and layer compatibility.
- Instructions for downloading models from Hugging Face and preparing for the blending process.
10:33 *📝 Configuring YAML for Blending*
- Step-by-step instructions on setting up YAML files for blending, highlighting the importance of specifying base models, merge methods, and parameters.
- Offers practical tips for configuring blending parameters to optimize the blending process.
13:42 *🚀 Executing the Blend and Evaluation*
- Detailed walkthrough of the blending execution using MergeKit and subsequent evaluation through a text generation interface.
- Encourages testing and fine-tuning of the blended model before submission to benchmarks or public use.
15:45 *📊 Performance Testing and Data Contamination*
- Discusses the significance of performance testing on open LLM leaderboards and addresses the issue of data contamination in model training.
- Highlights the importance of careful model selection and blending strategy to avoid overfitting and ensure genuine improvements in model performance.
I hope this helps everybody!
@DemiGoodUA Před 2 měsíci ⁺¹
Nice Video! Do we have the ability to fine tune the model on own codebase?
@noblewarrior4776 Před 2 měsíci
You are amazing… thank you
@Alf-Dee Před 2 měsíci
Amazing video! I didn’t know it could be done.
I am definitely going to make my own uncensored blended model for coding.
I am tired of openai telling me that I should not modify/hack code without owner permission even if I am the owner, and I am trying to test how solid the code is…
@xspydazx Před 2 měsíci
Very good lesson and explanation ! So far the best on this subject .. as the main problem I have was running the models after . I could not find the definitive method to work ... Despite one of the models scoring high it could not run in the HF Inference plugin on the model card ..
@tiberiumihairezus417 Před 2 měsíci
Great content.
@lokeshart3340 Před 2 měsíci ⁺⁵
Can we blend multimodal models like llavaa and mistral and gemini vision? Can u make a video on it pls..❤❤
@maya-akim Před 2 měsíci ⁺³
oh that's interesting, I got to say I didn't try but I'm curious myself! I'll see how it goes and either I'll make a video or I'll let you know somehow
@lokeshart3340 Před 2 měsíci
@@maya-akim sure.
@EduGuti9000 Před 2 měsíci
¡Gracias!
@hand-eye4517 Před 2 měsíci
We thank you for all the amazing content and as such , being a great content creator , i dont wanna sound nitpicky , but since you are already attracting and leaning towards the DIY crowd you may as well be using the open source tools as well { vs Codium} etc. Just a small critique because i love the content.
@maya-akim Před 2 měsíci
hey thanks for support and feedback 🙏🏻 I'm not sure I totally follow. Do you suggest that I switch to Codium? Honestly, before your comment I assumed that VScode is open source, but after googling a bit I realized that the product itself isn't actually. But I looks like Codium is os, so you think that that's a better fit for the channel?
@Linguisticsfreak Před 23 dny
Since we don't have access to the training data, it is simply impossible/unfeasible to choose models based on whether they have or don't have contaminated data.
@leumas_tai Před 2 měsíci ⁺⁴
Great video. How does this differ from the Mixture of Experts (MOE)?
@maya-akim Před 2 měsíci ⁺³
that's an excellent question! first of all, I noticed that the community doesn't consider MoE to be merged models, even though you can use mergekit to create MoE yourself (instructions in the description box). My understanding is that blended models become "fixed" when it comes to their capabilities.
MoE capabilities change dynamically thanks to gating mechanism that decides how much of each expert's advice to follow for a given input. You specify prompts (or simple strings with mergekit) that activate specific expert. For example, here's a configuration that I used for MoE: huggingface.co/mayacinka/West-Ramen-7Bx4 as you can see, positive and negative prompts will "guide" the model.
@leumas_tai Před 2 měsíci
@@maya-akim interesting. thanks for sharing your thoughts I'll look it out.
@mikect05 Před 2 měsíci
The combination of spending time messing w ai along with your videos are inspiring me to build my own workstation.
Not sure if that's smart considering I don't know how to code.
So far I have ordered:
super micro x12dai mobo
2 platinum 8352s
2 rtx 3090s
2 sata 12 tb
2 optane nvmes for os and quick retrieval stuff
128 gb lrdimm ddr4
E-ATX case, cables & ps
Do you do any consulting work via zoom? I may need some direction soon.
@geekyprogrammer4831 Před 2 měsíci ⁺³
Very underrated channel!! This is enlightening. How a person can be so smart and beautiful too at the same time 😭😭
@Cloudvenus666 Před 2 měsíci ⁺²
What happens if you merge two models of the same family but they each have different context lengths? Does the model with the larger token window take precedence?
@maya-akim Před 2 měsíci ⁺²
it will depend on the "base model". But, in the cases that don't require defining a base model (like passthrough) or this hacky case here: huggingface.co/mayacinka/chatty-djinn-14B. when I merged models with 32K and 8K context window, the 32K models overpowered the 8K open chat model.
@Cloudvenus666 Před 2 měsíci
@@maya-akim thank you
@chuchel3156 Před 2 měsíci
Nice video
@_codegod Před 2 měsíci
Thanks! What software are you running for loading and inferencing your merged LLM using localhost in browser?
@maya-akim Před 2 měsíci
that's oobabooga's text generation UI. It allows you to run any model, whether it's saved locally, or on huggingface's hub
@_codegod Před 2 měsíci
thanks@@maya-akim
@johntdavies Před 2 měsíci
Maya, a great video, thank you. Quick question, where are you based? The reason I ask is I'm looking for an AI speak in the UK, you came to mind so was just wondering. Again, excellent video, amazing depth.
@maya-akim Před 2 měsíci
hey John, thanks a lot for the support! I live in Austin, TX, so I'm afraid I won't be of any help :/
@johntdavies Před 2 měsíci
@@maya-akim Damn, that's a long way away! Never mind, keep up the great work and thanks for getting back 🙂
@axe863 Před měsícem
Stacked and Cascading ensembling have been around for awhile
@abdallamosa8836 Před 2 měsíci
Is Combining tools like SWE-Agent, Crew AI, and OS-Copilot into a cohesive agentic workflow possible
@JoelSiby-ju5pf Před měsícem
after that i could use my customized model from hugging face or locally on my app's?
@JoelSiby-ju5pf Před měsícem
also now that i have decided to use this model on my creating of gen-ai app's how would i load?
llm = ??? # provide me the syntax for this
@bgNinjaart Před 2 měsíci
Genius
@PRColacino Před 2 měsíci
Maya.. you are the girl!!!
@amandamate9117 Před 2 měsíci ⁺¹
can you test agent frameworks like CrewAi with Claude 3 opus?
@GuidedBreathing Před 2 měsíci
3:40 and now add robots 🤖 cheers🥂
@nimesh.akalanka Před měsícem
How can I fine-tune the LLAMA 3 8B model for free on my local hardware, specifically a ThinkStation P620 Tower Workstation with an AMD Ryzen Threadripper PRO 5945WX processor, 128 GB DDR4 RAM, and two NVIDIA RTX A4000 16GB GPUs in SLI? I am new to this and have prepared a dataset for training. Is this feasible?
@inout3394 Před 2 měsíci
LLM: Tokenization vs MAMBA, please make video about this
@SinanAkkoyun Před 2 měsíci
Bach wtc 1 prelude 21 😍
@justinwhite2725 Před 2 měsíci
LLMs catching up to something Stable Diffusion users have been doing for awhile.
Open source is the way.
@BrandonFurtwangler Před 2 měsíci
Why does Slerp only support two models?
Can’t you just slerp between pairs, then slerp the slerps, etc until you have 1?
@maya-akim Před 2 měsíci
yep, you absolutely can slerp the slerps of the previously slerped slerps. That's what a lot of people do.
@yougaming8217 Před 7 dny
Is it possible to merge 7B with 8B models?
@user-ml9ph9tf1b Před 2 měsíci
My only question while watching was. Why should I make a model? I figure there is going to a be a infinite number of models being created by people and soon to be ai models created by ai models. So my question is, what is the point of making a custom model aside from fine tuning on data. I use autogen, would creating a model like your doing. empower a local model to let's say.. chat on my data, and be good at function calling? maybe this would be an experimental way to possibly make my own model specifically for autogen? Like Ik someone out there is already working on that specifically and even you showed those models specifically used for function calling in one of your other vids.
@maya-akim Před 2 měsíci ⁺¹
oh that's a great question! here's how I would use it: 1. Find a model that scores highly on MMLU benchmark (which means that It has diverse knowledge). Blend it with a model that you like because of how its "vibe". For me that would be openchat because I like how conversational it is. The blended model would perform better than the two "parent" models. 2. I'm actually working on this one. I'm trying to fine tune one model to specifically be good at crafting youtube titles. And another one to write good youtube scripts. Than, I'll try to blend those two.
@PaulSchwarzer-ou9sw Před 2 měsíci
🎉
@Maisonier Před 2 měsíci ⁺¹
So it's like mixing colors, back in kindergarten, you'd always blend everything together hoping to create this amazing hue, but it always just ended up this muddy, ugly brown
@Dhirajkumar-ls1ws Před 2 měsíci
👍
@NickDoddTV Před 2 měsíci
Good soup
@zippytechnologies Před 2 měsíci
At first - I was excited to see a new video with useful info - but when it got to that crime scene mapping thing you do - well... sorta creepy, no? What is that method called? Conspiracy mapping? Good visuals but wow... I lost track of what was going on with it... maybe it was more of a "Why are you putting holes in you walls? Some poor guy is gonna be like "...where's the spackle and putty knife? Some tenant/wife/daughter/kid poked a bunch of holes in my wall"... I never understood how so many holes got poked into my daughters walls or even our living room walls (ahem... the wife) but maybe this is just something that is fun to do? Now, do a video on how to patch all those little holes and get a paint roller with medium nap to repaint and cover everything up - but don't just paint a small area... no.. gonna probably have to paint the whole wall so there's no more streaks and visible coverups.. or at least learn how to feather out the edges so they blend better with the existing paint on the walls.. ugh... can't plug those holes and paint with an ai agent (yet)... so at least some skills are still worthy of known and learning... go get a guy or gal with some handy work skills - mechanical skills or something useful that AI can't do well any never will (likely for a long time) and you at least know your guy/gal will be useful given that AI will be putting lots of other people out of work (and is already doing so). I need to hire some people to help me get this working for our company - but I can't afford to keep paying drywall contractors every time we get a new idea... lol
@yellowboat8773 Před měsícem
Wow, you have too much time on your hands
@florentflote Před 2 měsíci
@oryxchannel Před měsícem
Wanna get “addicted”.
@gareththomas3234 Před 2 měsíci
why not just use autogen?
@free_thinker4958 Před měsícem
Autogen is full of crap
@MichaelDomer Před měsícem
Stop saying it's so simple... yes, for you it is.
@DC-xt1ry Před měsícem
Monoltic LLMs < MulitAgents
@rinokpp1692 Před měsícem
CAN I use agent on my mobile device
@LukasSmith827 Před 2 měsíci
your timing is scary
@maya-akim Před 2 měsíci
what do you mean?
@ServerGamingTop100 Před 2 měsíci ⁺¹
It's not about collecting links with information and adding below the video... The important information is: which models are compatible and how to write the configuration file, which you barely mention! I can find all these links myself.
@PazLeBon Před 2 měsíci
so a claude haha
@JINGWA64 Před měsícem
problem with making vids that require prior knowledge and experience, is those who would find the information most useful, cannot make use of that information due requiring that prior knowledge and experience, yet at the same time the information provided in the vid is at the level to service a novice who had no prior interest, so who is the audience being catered to?
@MichaelDomer Před měsícem
Too many video of AI nerds on TouTube... for AI nerds, hardly anyone makes videos for the average John and Jane, resulting in a large group people detached from AI.
@maya-akim Před měsícem
what types of videos would appeal to average John and Jane?
@usmanthechamp123 Před 2 měsíci
@maya-akim blending would be the best word for this right, merging I think is the word people are using for it don't you think
@EduGuti9000 Před 2 měsíci
¡Gracias!
@maya-akim Před 2 měsíci
Thanks a lot 🤗

Další v pořadí

Automatické přehrávání

What If Your LLM Could Become an Expert on Anything You Want?