This Isn't Just A Chatbot (OpenAI Should Be Scared...)

Theo - t3․gg

zhlédnutí 123 319

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 19. 02. 2024
I heard nvidia was doing some chat bot stuff, but Chat With RTX ended up being much more interesting than I expected. Retrieval-augmented generation (RAG) is a fascinating new technique and I'm curious how we see it adopted over time. Compared to ChatGPT and ollama, this is very different.
"insert statement about Tensorflow for SEO reasons here"
Sources:
www.nvidia.com/en-us/ai-on-rt...
github.com/NVIDIA/trt-llm-rag...
github.com/ollama/ollama
Check out my Twitch, Twitter, Discord more at t3.gg
S/O Ph4se0n3 for the awesome edit 🙏
Věda a technologie

Komentáře • 251

@yowwn8614 Před 3 měsíci ⁺⁴⁹³
That start with linus is makes my heart drop.
@H-Root Před 3 měsíci ⁺²²
Same point 😂
When I saw him I thought I clicked the wrong notification for a split second
@berkeozkir Před 3 měsíci ⁺¹
Same! Wasn't expecting that.
@danielfernandes1010 Před 3 měsíci ⁺²
Wow somehow I missed it lol
@MelroyvandenBerg Před 3 měsíci
hahhaha.. there we go Linus.
@blacktear5197 Před 3 měsíci
Jajajaja a ver si os enteráis que el OpenSource es una de los mayores engaños de las multinacionales para que trabajéis a cambio de una camiseta
@MrSofazocker Před 3 měsíci ⁺²⁴¹
ONLY addition, it HAS to support markdown.
Imagine, just setting this to your obsidian vaults folder path and boom, you can chat with your second brain 🤯
@Sanchuniathon384 Před 3 měsíci ⁺³⁴
I need this yesterday
@personinousapraham3082 Před 3 měsíci ⁺¹¹
The way these models work means it almost definitely already does to some extent, since there a decent amount of markdown in the training process for the models used for both generation and document embedding (making the search part work). At that point it's mostly just a matter of prompt tuning
@z_0968 Před 3 měsíci ⁺⁹
What if I told you that already exists?
- There is an Ollama plugin to chat with your notes. Although this is more single note specific, it's free.
- There is also a plugin called Smart Connections, but you need a OpenAI API key for that. This one is note global, it will create embeddables (vector files created from your notes) from your notes. And then you can chat with all your notes.
@cabaucom376 Před 3 měsíci ⁺⁷
So I’m working on this… anyone who sees this comment what are some must have’s for a local first AI knowledge vault?
@carvierdotdev Před 3 měsíci ⁺¹
❤ oh man that's actually a very good thinking..
@hugazo Před 3 měsíci ⁺¹⁷⁷
Finally one use for my 4080 that doesn't involve crying trying to play cities skylines 2
@RT-. Před 3 měsíci ⁺²
Wut? It lags on that game?
@Rock48100 Před 3 měsíci ⁺¹⁵
@@RT-. Yeah that game is beyond a mess
@hugazo Před 3 měsíci ⁺¹
The game is broken@@RT-.
@GetUrFunnyUp Před 3 měsíci ⁺³
Isnt that a cpu based game
@oo--7714 Před 3 měsíci ⁺¹
@@GetUrFunnyUpyes
@MrLenell16 Před 3 měsíci ⁺¹⁴⁰
It's not training the model just doing RAG. Retrieval is basically querying for relevant docs based on semantic similarity basically doing a sql query which a vectors in the where clause
@poipoi300 Před 3 měsíci ⁺¹²
Yep thanks for that, was about to comment something similar. Words are coordinates, yo.
@Imp0ssibleBG Před 3 měsíci ⁺²
Is it done by searching to the nearest/closest embedding?
@LookRainy Před 3 měsíci ⁺¹
@@Imp0ssibleBGpretty much. But there are other different approaches for doing RAG
@hunterkauffman9400 Před 3 měsíci ⁺⁴
simplest technique is the cosine similarity between the query and each document chunk.
@joreilly Před 3 měsíci ⁺¹
But is this the best example of 'kind of' training your own model? There's a race right now for people to train models on private or proprietary data. RAG seems to be the most practical solution so far even though it's not perfect. Or am I wrong about this?
@YomiTosh Před 3 měsíci ⁺⁸⁴
Hey Theo, just wanted to point out a few inconsistencies. RAG doesn’t train a model, it indexes the text files in a vector database and uses word similarity to look up relevant text. So the model, such as llama2 or mistral is unchanged but it is able to add context and make the retrieved text more conversational.
There are loads of great AI/RAG projects other than Ollama out in the git seas too. Many not quite as simple or easy to use though.
Thanks for all the great videos. Already subscribed ;)
@cryptogenik Před 3 měsíci ⁺³
Came here to say this, and also - your bias is showing :P
@rendezone Před 3 měsíci ⁺⁴
Shamelessly mentioning here the product/startup I work for: Qdrant is an open source vector database which excels at RAG setups. I created the JS SDK :P which offers fully-typed REST and gRCP clients
@lukeweston1234 Před 3 měsíci
@@rendezone What does it offer that PGVector for instance does not?
@martinkrueger937 Před 3 měsíci
@YomiTosh by any chance do you know which RAG system/framework is giving out the best performance?
@rusyaidimusa2309 Před 3 měsíci ⁺³³
I would consider giving a shoutout to the llamacpp project that serves as the backend engine to many of the open source programs like Ollama, and the many many talented engineers who brought support to so many different systems configurations.
The open source scene has been on fire since Llama dropped and running models locally has never been easier.
@tuckerbeauchamp8192 Před 3 měsíci ⁺⁴
Oh man, was not ready for that intro. I love LTT and your channel, that was a great little combination
@medalikhaled Před 3 měsíci ⁺¹³
they are not directly using svelte, they are using a project OSS project called Gradio for the UI which uses svelte
@DaniDipp Před 3 měsíci ⁺³⁵
This is huge for my wiki. I can just give it a directory of markdown files. 🤯
@hugazo Před 3 měsíci ⁺⁵
Better search in docs, i would add my frameworks/libraries documentations as well
@ofadiman Před 3 měsíci ⁺⁹
Finally 🎉 I couldn't wait any longer for ray tracing support in my chat bot GUI
@SenorRobinHood Před 3 měsíci ⁺⁶
Would this work on the codebase for a library? For example inputting a freshly downloaded wordpress directory and then also digesting the wordpresss developer docs to make it your private Q&A tutor for platform you're trying to learn?
@Al-Storm Před 3 měsíci ⁺¹
Yes
@Sindoku Před 3 měsíci
I’ll definitely be checking this out this weekend when I don’t have to work. This looks bad ass!
@adam_k99 Před 3 měsíci ⁺²
Good stuff! Could you make a video on how well it performs as a coding assistant?
@GeorgeG-is6ov Před 3 měsíci ⁺¹
If you don't want to use it because it's so large, get ollama and you can run it on your command prompt, I recommend watching a tutorial on it, and there's models as little as 1.8 gb (For example, phi 2, which is small yet very powerful)
@aloufin Před 3 měsíci
Yes, your explanation of RAG was very nice and easy to understand
@Petyr25 Před 3 měsíci ⁺²
Feedback: Superb video, more AI stuff from you would be great. Specially with open source stuff with our own data.
@DNA912 Před 3 měsíci ⁺³
While watching this I started to realise how huge usage I would have with this at work. The project I'm in has a huge documentation, but everything is just a brain dump, and there have many times happened that we've found something "new" in the docs that we completely had missed before. Imma work on making an AI on that dataset asap. I love experimenting with AI's locally, it so fun and it feels so much cooler and better then the cloud once
@user-pc8vn6ym7r Před 3 měsíci ⁺⁴
I have to admit, that is the MOST creative L&S I've ever seen on here. And I normally swear at the screen in response.
Maybe.
@brett_rose Před 3 měsíci
I did something similar with Pinecone. I parsed a huge chuck of wiki data into a Pinecone DB. I then would use one model prompt which would return multiple pieces of data based on the prompt. That model would then decide which pieces of outside information were the most related to the prompt. It would then send the original prompt along with the external data to a new model prompt which then would provide the response to the user.
@Falkov Před 3 měsíci
Good stuff..liked and subbed.
@Al-Storm Před 3 měsíci ⁺¹
WSL2 works surprisingly well. I've been using it on one of my machines for SD, llama, and mixtral.
@niteshbaskaran2262 Před 3 měsíci
If all of the python docs were fed to an LLM model, would you use query that LLM model or still refer to the original docs?
@MightyDantheman Před 3 měsíci ⁺¹
This is pretty cool, though I'm still waiting for the day that I can use at least GPT-4 level AI locally *(and ideally either for free or a one-time payment for one single version).* Sadly, I doubt this will ever happen outside of opensource projects, which tend to not be as good due to less funding and resources. But I still appreciate any effort put towards that future.
@arnaudlelong2342 Před 3 měsíci ⁺¹
You and Prime need to get with this soon
@unowenwasholo Před 3 měsíci ⁺¹
This is like ControlNet for LLMs. Dope.
@SkyyySi Před 3 měsíci ⁺¹
The app isn't made with Svelte, but with Gradio. Gradio is a Python library for creating web UIs for ML applications. Gradio, however, uses Svelte and Tailwind internally.
@sarjannarwan6896 Před 3 měsíci ⁺¹
Rags are cool, they can use vector databases to map to data.
@arianj2863 Před 3 měsíci
Could you make a small RAG project :-)?
Or do you have a channel who is like the theo of open source LLMs?
@Tymon0000 Před 3 měsíci ⁺⁴
This will be huge when the ai will be capable of parsing a whole project and multiple docs.
@pencilcheck Před 3 měsíci ⁺¹
ahhh, imagine parsing and generating tests that makes sense based on prompts ::OOOO
@RisingPhoenix96 Před 3 měsíci ⁺³
2:19 The CZcams algorithm recommends me your videos frequently. Is there any real benefit I get from subscribing if I'm going to watch your videos and see all your community posts anyway?
@Gocunt Před 3 měsíci
what if i don't subscribe to anyone because i just don't want to be subscribed to a bunch of random channels? ai doesn't understand
@creatortray Před 3 měsíci
I love it! I find this stuff fascinating
@entropywilldestroyusall1323 Před 3 měsíci
Great vid, Adam.
@sadshed4585 Před 3 měsíci ⁺¹
what is vram for RAG or their version of it
@jzeltman Před 3 měsíci
Would love to see more AI content. Great look into this new release from NVIDIA
@nothingtoseehere5760 Před 3 měsíci ⁺²
NOT DEEP ENOUGH! MOAR PLZ!
@E-Juice Před 3 měsíci ⁺²
I make 2 points here. 1 questioning the accuracy of this system 2 why windows
1. what's interesting about downloading youtube video transcripts and using those files at 7:40 is that nvidia's setup is MOST LIKELY using their own ASR (Automatic Speech Recognition) model, either Canary or Parakeet, which i've tested and found that theyre good but still not as accurate as open ai's Whisper ASR model. So without knowing what specific model is used to transcribe the youtube videos, we don't know how exact those transcriptions are, so that affects how well this RAG can asnwer questions using that data. I would reccommend using Whisper-Large-v3 and manually transcribing the youtube videos, or just uploading actual documents and notes and testing them rather than transcribing youtube videos.
2. you dont reccommend using WSL but you didnt elaborate. what is the best alternative? installing linux locally or using a cloud workstation? dont say mac because they dont come with nvidia gpu
@TrimutiusToo Před 3 měsíci ⁺³
Deeper, go even deeper!!!
@eointolster Před 3 měsíci
What’s the advantage over privateer that’s been out for months where you can choose your own model and it is tiny in comparison?
@aloufin Před 3 měsíci
I keep thinking about this video.... RAG is showing up on my timeline on twitter everywhere... I would have had to spend HOURS trying to understand it... let alone realise I could download NVIDIAs demo and run it on my GPU..... Your videos are amazing to understand huge swathes of new AI tech easily... not to mention actually show working tech demos.
@tasmto Před 3 měsíci
Ok... this is really really cool!
@chaks2432 Před 2 měsíci ⁺¹
This kind of stuff would be a lifesaver if it manages to work as an AI powered chatbot for documentation for proprietary frameworks and stuff. I'm working at a startup and we're building our own framework from scratch, so having RTX Chat work as an AI documentation assistant would be great
@pixma140 Před 3 měsíci ⁺³
Nice thing what Nvidia did there :) Do you want to share the two CZcams Playlists in the comments or description maybe? :D
@hohohotreipatlajele2044 Před 3 měsíci
I've tried but it's a bit strange and slow and I didn't find how to start again after shutting down
@bugged1212 Před 3 měsíci
Already been using this, I have also wired up ollama to serve multiple requests and I run a business off it now.
@Fire.Blast. Před 3 měsíci
1:36 "as you can see, it's pretty fast" yes, instant even it would seem
@riftsassassin8954 Před 3 měsíci
Definitely on team AI deep dive!
@user-tk5ir1hg7l Před 3 měsíci
50% faster inference with nvidia gpus on tensor-rt is no joke, i hope they expand this and let you fine tune and add models
@exapsy Před 3 měsíci
00:03 and i already saw linus dropping not just something, a graphics card.
Instant like! xD
@schtormm Před 3 měsíci
8:17 for some reason the Dutch public news broadcaster also uses svelte sometimes lmao
@blenderpanzi Před 3 měsíci ⁺³
Point it to a playlist of Jonathan Blow videos and then tell it JavaScript is the best language and ask when will Jai have LSP support? Can a LLM can get an aneurysm?
@chriss3154 Před 3 měsíci
A comparison against privateGPT and/or localGPT would've been awesome
@lancemarchetti8673 Před 3 měsíci
Crazy... it's hard to keep up. And now there's Groq.. which is ridiculously fast.
@TheGoodMorty Před 3 měsíci
Ollama just released the Windows preview
@juanmacias5922 Před 3 měsíci
Damn, that's impressive.
@red9090 Před 3 měsíci ⁺¹
Theo just dropped a 3 min suscribe pitch.
@azeek Před 3 měsíci
Brooo what an opening ❤❤😂
@banalMinuta Před 3 měsíci
I would definitely appreciate more AI content as somebody just getting into web development.
It seems pretty apparent that AI is going to unleash a new category of tools, one whose mastery will most likely be paramount to ones success
@sozno4222 Před 3 měsíci
The models you can run on chat with RTX are a bit inadequate right now. But it shows promise
@christianremboldt1557 Před 3 měsíci ⁺¹
Never thought an AI would convince me to subscribe to someone
@setasan Před 3 měsíci
That would be great for my hundred pages onenote files.
@zyxwvutsrqponmlkh Před 3 měsíci
I wonder how large a single text can be for this to work. Can I throw whole books at it? What about my states entire law code?
@PRIMARYATIAS Před 3 měsíci
Can be good for learning, Could point it to some programming books I have so I can "chat" with them 😂
@MobCat_ Před 3 měsíci
I wanna point that folder at my current project im working on.
or a massive archive or python code lol...
@kyleleblancvlogs3820 Před 3 měsíci
Finally when someone says "when did i say that !! Huh"
I can go. If this video on this date at this timestamp. Checkmate
@banalMinuta Před 3 měsíci
Why would you not recommend running ollama on WSL right now
@jackg_ Před 3 měsíci
Here's hoping more and more AI things move to have local options. Sure not everyone can run these locally, I am typing this on an iMac from 2015... BUT it is super promising.
@ThePawel36 Před 3 měsíci
I wonder if you could train your language model to play a game on your behalf, such as Cyberpunk, for example of course. It seems feasible, as some local language models are equipped with vision capabilities. It would be fascinating to witness the first CZcamsr attempting this."
@hairy7653 Před 3 měsíci ⁺¹
the CZcams option isn't showing up on my rtxchat
@omanimedia Před měsícem
Same Issue Bro I Also searching for this whole internet but couldn't even get one useful tutorial🥲 If you figured it out Then please let me know.
@kennypitts4829 Před měsícem
Lawyer: AI, find precedence to get my client off the hook for drunk in public.
AI: Beep, bop... Bort - Say it was diabetes related.
@Readraid_ Před 3 měsíci ⁺¹
'Nvidia just dropped' linus clip is CRAZY
@FarishKashefinejad Před 3 měsíci ⁺¹
Theo Tech Tips
@MultiMojo Před 3 měsíci
Keep in mind that Chat with RTX bundles a 7B parameter model, which will consume GPU memory during use. Inference is going to be painfully slow if you're running a weaker gpu. Responses from this model aren't going to be at par with GPT4/Claude. If you're looking to chat with your own documents, paying for an OpenAI API key w. langchain RAG implementation is the more efficient way to go.
@dubya85 Před 2 měsíci
It will not work on anything less than 8gb 30 or 40 series rtx cards. 12g Min for the larger ai model
@cintron3d Před 2 měsíci
Fine fine I'll subscribe.
@anime.x_ror Před 3 měsíci
i missed nvidia exe for nvidia chat zip. could someone share it with me? )
@__greg__ Před 3 měsíci
Nice
@minimal2224 Před 3 měsíci
Woah woah woah it’s only fast because of your laptop hardware lol M1 - M4 chip?
@bhaskaruprety230 Před 3 měsíci
Please make a video on SLM
@jaylenjames364 Před 3 měsíci
I can get a little more specific on a topic with chat with rtx. Compared to chatgpt.
@Hunger53 Před 3 měsíci ⁺³
The main downside of this program is it only parses one file at a time, even if you have multiple files with data. Kinda meh if you need to do comparisons or use one file as a context to process the second.
@user-oo2wb8tf7i Před 3 měsíci
You need CrewAI
@vitorwindberg4212 Před 3 měsíci ⁺²
I agree. It would be awesome if, for example, in the "What is Theo's favorite library?" question, the model could use all different videos data at once and assume it's React - instead of relying on a single video that it deemed the most important for that question.
@sauer.voussoir Před 3 měsíci
Nice pointing it out, I didn't even realize it did it that way. Hopefully they get updated along the way.
@FamilyManMoving Před 3 měsíci
This is a simple demonstration cobbled together from open source. It's not meant to be an actual system. What you want is already available; it just requires a little more work on your end, and models that can handle the actual data.
Context length is a real issue with a lot of open source models. There is only so much RAG can do if a model limits context to 2048 tokens, for instance. I've had models start hallucinating when they get close to the limit. The good news is those hallucinations are so off the topic that it's obvious when they occur.
@dubya85 Před 2 měsíci
It can look at everything in a folder at once
@scottiedoesno Před 3 měsíci ⁺¹
MOAR AI. The solution to having a job with AI is knowing how to use it
@johnbarros1 Před 3 měsíci
Hey I hit subscribe! Gimme more Ai!
@TheD3adlysin Před 3 měsíci ⁺¹
This is a good video. Just wish you could run this on Linux....with a AMD card.....heh
@d4rkg Před 3 měsíci ⁺⁴
I never expected to see Freddie Mercury talking about AI
@nathanfife2890 Před 3 měsíci
I'm curious how good it is at code
@alireza-bonab Před 3 měsíci
👏💚
@andrewdunbar828 Před 3 měsíci
Ask me about a question and I'll tell you about an answer.
@jazilzaim Před 3 měsíci
This is going to give another moat to Windows devices over Mac and Linux devices
@amodo80 Před 3 měsíci
you can so RAG with ollama when you run ollama-webui
@seanmartinflix Před 2 měsíci
Okay this is something I'm trying to learn. For the record I'm not a programmer just an enthusiast trying to learn stuff kind of a dummy Compared to you all probably. But what do you guys think of Pinocchio I've gotten llama to run through it but it doesn't always run And there's not very many good tutorials on Pinocchio . I would love it to be covered on this channel. just what you think of it? And other insights if possible. Anything I really can find on it is very basic and only gets you so far. Weird comment I know. like the channel always some great insights.
@undefined6512 Před 3 měsíci
What's Ollama's last name?
@Endelin Před 3 měsíci
PrivateGPT is doing something similar to this.
@jacobgoldenart Před 3 měsíci
I'm confused here, There are a ton of vector databases that you can install and run on a Mac. No external GPU needed. Like Chromadb, or Faise. Then just use something like llamaindex or langchain to chunk your documents and create embeddings using something like openai's ada2. Then insert them into chroma and starting doing rag on your documents. You certainly don't need an Nvidia GPU.
@Dav-jj2jb Před 3 měsíci
I would be excited about AI if people would stop at chat bots, but we won't. It will get from "cool chat bot" to "AGI existential dread" real quick.
@hallooww Před 2 měsíci
facebook actually making progress in the AI scene…
@patricknelson Před 3 měsíci
Maybe “trt-llm-rag-windows” implies maybe there will be a Linux or MacOS version someday. 🤔
@jenny-DD Před 3 měsíci
Good now, I can just get the point of youtube videos by putting them into Nvidea cuts out all the fluff
@bblatnick1 Před 3 měsíci
Love the AI content.
@jhonnyrodrigues Před 3 měsíci
Chat with Ray Tracing -> nice
@TM_LBenson Před 3 měsíci
This will make college so much easier!
@RobbPage Před 3 měsíci
more Ai. definitely. as devs we need to stay on top of this stuff. the war against the machines has begun my friends and we're on the front lines.
@_DashingAdi_ Před 3 měsíci
I'm scared😢

Další v pořadí

Automatické přehrávání