Merge LLMs with No Code Mergekit GUI

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

Evaluate LLMs with Language Model Evaluation Harness

BARBORA KROPÁČKOVÁ: 100 tisíc za augmentaci prsou musím mít vždy po ruce • ROZHOVOR

Kiedy po ciężkim tygodniu idziesz na imprezę 😂

Amazing woodworking skills! Simple and Reliable way to attach a board to stone or concrete #shorts

Quantize LLMs with AWQ: Faster and Smaller Llama 3

AI Anytime

zhlédnutí 2 384

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 25. 04. 2024
Explore how to make LLMs faster and more compact with my latest tutorial on Activation Aware Quantization (AWQ)! In this video, I demonstrate how to apply AWQ to quantize Llama 3, achieving a model that's not only quicker but also smaller than its non-quantized counterpart. Dive into the details of the process and see the benefits in real-time. If you found this video helpful, don't forget to like, comment, and subscribe for more insightful content like this!
Join this channel to get access to perks:
/ @aianytime
To further support the channel, you can contribute via the following methods:
Bitcoin Address: 32zhmo5T9jvu8gJDGW3LTuKBM1KPMHoCsW
UPI: sonu1000raw@ybl
GitHub: github.com/AIAnytime/Quantize...
Activation Aware Quantization Research paper: arxiv.org/pdf/2306.00978
Quantized Model on HF here: huggingface.co/skuma307/Llama...
#llama3 #genai #ai
Věda a technologie

Komentáře • 17

@mehmetbakideniz Před 14 dny
fantastic video! I will watch the other videos. Definetely a very talented tutor here!
@joserfjunior8940 Před 4 dny
Cool !
@Suparious Před 19 dny ⁺¹
Great video! Thank-you for sharing.
@IdealVijay- Před 4 dny
Does quantizing a model make it less accurate? How many parameters will be in the Quantize Model? If It is 13B then how quantizing the model is making it faster?
@BeegBrain-zy7qc Před 19 dny
new sub from usa. may i suggest in depth guide(s) on ontologies, knowledge graphs, and query analysis. many thanks for great info.
@christiand6312 Před 13 dny
Can we collab on a project, also Cuda Vs Triton, and also inference evaluations.
How do you make Research into Code?
Can we work together?
@cristianaguilar4253 Před 19 dny
Thank
@thisurawz Před 19 dny
how to quantize the multimodal llms?
@maitreyazalte6971 Před 18 dny
Doubt : In this case, we are downloading the entire model first and then quantizing it. Is there any way to quantize a model on the fly during loading? Since I'm GPU poor, I might not be able to download the entire model, and hence can't quantize. Please suggest something...
@lazypunk794 Před 19 dny ⁺¹
awq has lower throughput than unquantized model when serving using VLLM. Do you know if there are quantization methods that can also increase throughput?
@nashtashasaint-pier7404 Před 19 dny
+1
@ShaunPrince Před 19 dny ⁺¹
This is only true in inappropriate scenarios, where you don't have flash attention compiled, or if you are using a old gpu, like the colab T4.
Try to avoid using the pre-made docker images, ensure that all your hardware is enabled to it's best ability. Always use the latest python 3.11.x, latest cuda developers toolkit 12.x
Dont use the cuda GPU drivers, use drivers that you make yourself or that are meant for your operating system.
Then this stupid argument about vllm unquantized is faster is lo longer true. Not many people want to take the time to learn about and properly prepare their inference systems. AWQ is meant to save memory, Exl2 is like for finetuning for your available VRAM with their variable bbw and hb.
@ragibhasan2.0 Před 19 dny ⁺¹
Is "Fine Tuning of LLMs" playlist enough for finetuning any llam model?
@AIAnytime Před 19 dny
Yes!
@ragibhasan2.0 Před 19 dny
@@AIAnytime Thanks for creating this type of playlist☺
@IdPreferNot1 Před 18 dny
1.58 bytes seems so promising but i understand it has to be part of the original training, you cant post-training quantize. Have you heard of anyone actually training any models with this?
@sneharoy3566 Před 19 dny
Noice

Další v pořadí

Automatické přehrávání

Merge LLMs with No Code Mergekit GUI

Merge LLMs with No Code Mergekit GUI

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

Evaluate LLMs with Language Model Evaluation Harness

Evaluate LLMs with Language Model Evaluation Harness

BARBORA KROPÁČKOVÁ: 100 tisíc za augmentaci prsou musím mít vždy po ruce • ROZHOVOR

BARBORA KROPÁČKOVÁ: 100 tisíc za augmentaci prsou musím mít vždy po ruce • ROZHOVOR

Kiedy po ciężkim tygodniu idziesz na imprezę 😂

Kiedy po ciężkim tygodniu idziesz na imprezę 😂

Amazing woodworking skills! Simple and Reliable way to attach a board to stone or concrete #shorts

Amazing woodworking skills! Simple and Reliable way to attach a board to stone or concrete #shorts

Kitten has a slime in her diaper?! 🙀 #cat #kitten #cute

Kitten has a slime in her diaper?! 🙀 #cat #kitten #cute

How to Use Llama 3 with PandasAI and Ollama Locally

How to Use Llama 3 with PandasAI and Ollama Locally

Run a LLM on your WINDOWS PC | Convert Hugging face model to GGUF | Quantization | GGUF

Run a LLM on your WINDOWS PC | Convert Hugging face model to GGUF | Quantization | GGUF

Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

ChatGPT’s Amazing New Model Feels Human (and it's Free)

ChatGPT’s Amazing New Model Feels Human (and it's Free)

GraphRAG: LLM-Derived Knowledge Graphs for RAG

GraphRAG: LLM-Derived Knowledge Graphs for RAG

PaliGemma by Google: Inference and Fine Tuning of Vision Language Model

PaliGemma by Google: Inference and Fine Tuning of Vision Language Model

LangGraph 101: it's better than LangChain

LangGraph 101: it's better than LangChain

Why Does Diffusion Work Better than Auto-Regression?

Why Does Diffusion Work Better than Auto-Regression?

"okay, but I want Llama 3 for my specific use case" - Here's how

"okay, but I want Llama 3 for my specific use case" - Here's how

Apple’s SECRET About the iPad Pro

Apple’s SECRET About the iPad Pro

The Greatest Gaming PC Deal of All Time

The Greatest Gaming PC Deal of All Time

How about that uh?😎 #sneakers #airpods

How about that uh?😎 #sneakers #airpods

Samsung galaxy S24ultra titanium green 💚, Display quality 😱 Technology Digital #shorts

Samsung galaxy S24ultra titanium green 💚, Display quality 😱 Technology Digital #shorts

НЕЛЕПЫЙ ФЕЙЛ при замене гнезда на Usb-c в Xiaomi Redmi AirDots #wireless #mi #redmi

НЕЛЕПЫЙ ФЕЙЛ при замене гнезда на Usb-c в Xiaomi Redmi AirDots #wireless #mi #redmi

“Practical” 7 Segment Module Uses

“Practical” 7 Segment Module Uses

Tu primer teléfono estará aquí? Colección de celulares que tienes que ver

Tu primer teléfono estará aquí? Colección de celulares que tienes que ver

Unboxing iPad Pro M4 - lepší displej jsi neviděl! První pocity

Unboxing iPad Pro M4 - lepší displej jsi neviděl! První pocity