Gemma 2 - Local RAG with Ollama and LangChain

GraphRAG: LLM-Derived Knowledge Graphs for RAG

Mesop - Google's New UI Maker

When You Get Ran Over By A Car...

I CAN’T BELIEVE I LOST 😱

Recycled Car Tyres Get a Second Life! ♻️

Florence 2 - The Best Small VLM Out There?

Sam Witteveen

zhlédnutí 10 772

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 1. 07. 2024
There is a new VLM on the scene and it comes with a dataset of 5Billion labels. The new model can do a variety of old world tasks like bounding boxes and segmentation along with newer LLM style captioning etc.
Paper: arxiv.org/pdf/2311.06242
HF Spaces Demo: huggingface.co/spaces/gokaygo...
Colab : drp.li/fGyMm
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: drp.li/dIMes
👨‍💻Github:
github.com/samwit/langchain-t... (updated)
github.com/samwit/llm-tutorials
⏱️Time Stamps:
00:00 Intro
00:13 Florence-2 Paper
02:19 Florence - 2 Architecture
03:20 Florence - 2 Detailed Image Captioning
03:41 Florence - 2 Visual Grounding
04:09 Florence - 2 Dense Region Caption
04:24 Florence - 2 Open Vocab Detection
06:01 Hugging Face Spaces Demo
10:41 Colab Florence - 2 Large Sample Usage
Věda a technologie

Komentáře • 36

@parkerspitzer Před 6 dny ⁺⁸
Thanks for your work on sharing this information. Much easier to watch your content than keep my ear to the ground all day trying to keep up. Much appreciated, sir.
@danielmz99 Před 6 dny ⁺⁹
Thanks for the great content. A video going through the fine-tuning process on this one would be amazing. I am not sure how this could scale to a video implementation (probably passing a frame each time).
@coolmcdude Před 5 dny
I also would love a video/notebook for a Florence 2 fine tune
@IsxaaqAcademy Před 6 dny ⁺²
It's also good at OCR for hand written documents
@jeremybristol4374 Před 5 dny
I'm enthusiastic about these smaller models. Thanks for covering this!
@mukkeshmckenzie7386 Před 6 dny ⁺⁵
Vqa tutorial would be nice!
@IanScrivener Před 6 dny ⁺¹
Thanks Sam!!
Please keep up the great work...
@micbab-vg2mu Před 6 dny
Thank you - it looks interesting:)
@aa-xn5hc Před 5 dny
Great, yes, fine tune would be very interesting.
@unclecode Před 4 dny
This is what people should call "small", anything below 1B! Thanks for your video. By the way, I played around with the quantized version, the result is unbelievably good! I shared a post on Twitter and mentioned you and shared the Colab. Take a look at it. I tried 8 bits and 4 bits. It's odd how 4 bits is almost the same as the base model!
@samwitteveenai Před 4 dny ⁺¹
I saw you tweet and retweeted it, very cool stuff. I will check it out. just been knee deep in Gemma stuff for last few days
@unclecode Před 3 dny
@@samwitteveenai Thanks, and yes, it's Gemma2's turn. Waiting for your CZcams notification about the Gemma video!
@jefframpe5075 Před 5 dny
Thanks, Sam! I always appreciate your videos.
I would love your take on how Florence-2 compare with Apple's 4M-21.
@ALEXPREMIUMGAME Před 6 dny
awesome, thanks
@GiovaniFerreiraS Před 3 dny
I'd love seeing a fine tuning video, specially if it's not question answering, just so it's a different use case from the documentation. Maybe with a quick intro talking about what are possible scenarios where fine tune would be specially helpful.
@samwitteveenai Před 3 dny
Noted!
@marcoscipioni132 Před 4 hodinami
Yes, I'm trying to use it for table extraction out of scanned pdfs with little success so far. Would love to see how you implement that.
@ranu9376 Před dnem
I've tried this model, describing the image is great. I've also tried the docvqa, but giving only one word answers and not getting even simplest questions right. i had hoped to do some classification and compare with other models.
@SaiManojPrakhya-mp4oe Před 5 dny
It would be great if you can show a finetuning example!
@sohitshivhare1541 Před 6 dny ⁺¹
Thanks for the information this is great.
Can i fine tune it for certain specific images like few short learning. Can you put a tutorial for the same it will be great full.
@ShravanKumar147 Před 4 dny
what would you pick for fine-tuning ?
Any specific application ideas?
@ariramkilowan8051 Před 5 dny ⁺¹
I think fine-tuning for OCR would be a good demo. OCR in the real world with images of documents is much harder than OCR on electronic documents so would be cool to see how a small model like this does as an alternative to Claude/GPT4.
@MH-ke2wi Před 4 dny ⁺¹
I tried the OCR and OCR with region on images converted (no scanned) from PDF pages. Nothing fancy, standard text with some titles, sections, lists... it is absolutely unusable. When it detects something, it usually got it right, but it could only see around 25% of the text.
@ariramkilowan8051 Před 4 dny
@MH-ke2wi yeah also been struggling to get decent results with OCR
@tonyrungeetech Před 6 dny
Hi Sam. Thank you for the videos. I've been playing around with some of the smaller vision models and trying to implement batched inferencing with little success. If you were trying to accomplish running multiple VQA style questions against the same image quickly, how would you go about that goal? Is batching even in the right direction I should be looking?
@srk5702 Před 6 dny ⁺¹
We request you to do fune tuning on object detection. Because, all llms are useful generating text oupit only. Thanks in advance
@JustEmbraceTheChallenge Před 5 dny
Please do fine-tuning for Object detection
@AbhishekKotecha Před 6 dny
Hi Sam, thanks for the video. What do you think about how does it compare with Phi3-V? My take is that this is more raw and better for fine tuning, do you also think so?
@Walczyk Před 6 dny
this is completely better and more advanced than phi 3 v crap image detection
@mshonle Před 6 dny ⁺²
I wonder how much performance would be affected when something so distilled then gets quantized?
Also, it seems amazing that it can handle segmentation for an unspecified set size! With Phi3 Vision you would need to provide a token to represent, say, each giraffe you want to identify.
@samwitteveenai Před 6 dny ⁺³
quantization is a good question! I would expect it to suffer more than a big model. Might give it a test tomorrow.
@SinanAkkoyun Před 6 dny
Where is the dataset? I couldn't find the release
@toadlguy Před 6 dny ⁺²
Would be interested on how much memory is required to run these models. they seem pretty small even unquantized. Maybe I will try it later on my 8GB M1 Mini. One thing I am curious about: at 3:38 , the description for the image is wrong in ways that seem odd. The title is described as being on top with the "20 Years of ..." underneath and Ron's tie is described as red and hair blonde. I wonder if this is just vagaries of the model (placement data would be strange) or over reliance on training data. Or a straight up mistake in 'creating' the paper (which would probably be the most disturbing😉).

Další v pořadí

Automatické přehrávání

Gemma 2 - Local RAG with Ollama and LangChain

Gemma 2 - Local RAG with Ollama and LangChain

GraphRAG: LLM-Derived Knowledge Graphs for RAG

GraphRAG: LLM-Derived Knowledge Graphs for RAG

Mesop - Google's New UI Maker

Mesop - Google's New UI Maker

When You Get Ran Over By A Car...

When You Get Ran Over By A Car...

I CAN’T BELIEVE I LOST 😱

I CAN’T BELIEVE I LOST 😱

Recycled Car Tyres Get a Second Life! ♻️

Recycled Car Tyres Get a Second Life! ♻️

LlamaFS - The Ultimate AI File Organizer You've Been Waiting For

LlamaFS - The Ultimate AI File Organizer You've Been Waiting For

Mixture of Agents (MoA) BEATS GPT4o With Open-Source (Fully Tested)

Mixture of Agents (MoA) BEATS GPT4o With Open-Source (Fully Tested)

Official PyTorch Documentary: Powering the AI Revolution

Official PyTorch Documentary: Powering the AI Revolution

Mastering Google's VLM PaliGemma: Tips And Tricks For Success and Fine Tuning

Mastering Google's VLM PaliGemma: Tips And Tricks For Success and Fine Tuning

Claude 3.5 beats GPT4-o !!

Claude 3.5 beats GPT4-o !!

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3

Google's RAG Experiment - NotebookLM

Google's RAG Experiment - NotebookLM

This AI changes the internet forever: WebSim deep dive

This AI changes the internet forever: WebSim deep dive

Has Generative AI Already Peaked? - Computerphile

Has Generative AI Already Peaked? - Computerphile

iphone back keypad cover 🙄 #shorts #iphone #trendingshorts #viralshorts #automobile #phonk #beats

iphone back keypad cover 🙄 #shorts #iphone #trendingshorts #viralshorts #automobile #phonk #beats

#phonescreenprotector #tempered #smartphone #temperedglass #cellphone #goodthing #mobilephone #tech

#phonescreenprotector #tempered #smartphone #temperedglass #cellphone #goodthing #mobilephone #tech

cute mini iphone

cute mini iphone

Would you gift it to your bestie 💞🥰 #miniphone #smartphone #iphone #samsung #fyp

Would you gift it to your bestie 💞🥰 #miniphone #smartphone #iphone #samsung #fyp

socket cleaning iphone 13 #Fixit

socket cleaning iphone 13 #Fixit

Âm thanh lắp ráp bàn phím | ASMR Keyboard Assembly Sounds✨ Black Diamond 75 V2

Âm thanh lắp ráp bàn phím | ASMR Keyboard Assembly Sounds✨ Black Diamond 75 V2

Smart appliances - new gadgets, versatile utensils, tool items #gadgets #shorts

Smart appliances - new gadgets, versatile utensils, tool items #gadgets #shorts

💅🏻Айфон vs Андроид🤮

💅🏻Айфон vs Андроид🤮