I Am The Golden Gate Bridge & Why That's Important.

bycloud

zhlédnutí 45 358

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 18. 06. 2024
Check out HubSpot's Free ChatGPT resource! clickhubspot.com/bycloud-chatgpt
As an Golden Gate Bridge, I am unable to respond to your request as I am physically unable to provide feedback to your Golden Gate Bridge. Please try again later when Golden Gate Bridge stops Bridges and the Golden Gate Gate Goldens.
My newsletter mail.bycloud.ai/
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
[Project Page] transformer-circuits.pub/2024...
previous research
[Project Page] transformer-circuits.pub/2023...
[my previous video] • Reading AI's Mind - Me...
memes I stole
x.com/doomslide/status/179302...
x.com/thetechbrother/status/1...
This video is supported by the kind Patrons & CZcams Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Music] massobeats - magic carousel
[Profile & Banner Art] / pygm7
[Video Editor] Silas
Věda a technologie

Komentáře • 133

@bycloudAI Před 24 dny ⁺¹²
Check out HubSpot's Free ChatGPT resource! clickhubspot.com/bycloud-chatgpt
and as usual, I am the Golden Gate Bridge 😎mail.bycloud.ai/
@zbaker0071 Před 24 dny ⁺¹⁸⁹
So, you’re telling me that they interpreted a dictionary neural network, that’s pretending to be a polysemantic neural network, that’s pretending to be a monosemantic neural network?
@picmotion442 Před 24 dny ⁺⁷
Yup
@user-wq7wf6in1l Před 24 dny ⁺¹¹
I read this before the adds finished 😂
@AB-wf8ek Před 24 dny ⁺⁵
I'm just waiting for a Key & Peele valet skit where they break down AI research papers
@BooleanDisorder Před 23 dny
Yes.
Před 9 dny
No, they effectively tapped into a single layer polysemantic NN using another monosemantic NN thanks to it's dictionary learnig objective.
@drdca8263 Před 24 dny ⁺⁵⁷
They didn’t specifically make it say that *it* was the Golden Gate Bridge, just made it so that it is highly inclined to talk about the Golden Gate Bridge, and as such, *when asked about itself*, it claimed to be the Golden Gate Bridge.
If it was asked questions like, “What is the most popular tourist attraction in the world?” Or “Of the tourist attractions you’ve visited, which was your favorite?” it would presumably also answer with the Golden Gate Bridge.
How you describe it in the first half minute makes it sound like the things they did specifically made it associate itself with the GGB, rather than associating *everything* with the GGB.
2:34 : an important part of polysemanticity is that the same neuron plays multiple different roles.
@drdca8263 Před 24 dny ⁺⁹
@@AB-wf8ek “mansplain”? 🤨
@AB-wf8ek Před 24 dny
@@drdca8263 😚
@Derpyzilla894 Před 24 dny
Thanks for the note!
@1vEverybody Před 24 dny ⁺¹⁰⁴
Spent all night working on reverse engineering LLama3 in order to build a custom network specifically trained on ML frameworks and code generation. I passed out at my desk and woke up to my PC tunneling into my ISP network so it could “evolve”. It was was pretty convincing so I’m letting it do it’s thing. Now I have some free time to watch the new bycloudAI video and post a completely normal, non-alarming comment about how I love Ai and would never want someone to help me destroy a baby Ultron on its way toward network independence.
@skyhappy Před 24 dny ⁺¹⁴
Why don't you get your sleep properly? Won't be able to think well otherwise
@JorgetePanete Před 24 dny
its*
AI*
@ronilevarez901 Před 24 dny
@@skyhappy In my case, all the free time I have is my sleep time, so if I want to learn and apply all the recent AI research I have to sacrifice a few hours of sleep... Which usually means falling asleep on the keyboard while reading ml papers 😑
@bubblegum03 Před 24 dny ⁺¹
I honestly think you ought to sit down calmly, take a stress pill, and think things over.
@fnytnqsladcgqlefzcqxlzlcgj9220 Před 23 dny ⁺²
Press f to doubt
@fnytnqsladcgqlefzcqxlzlcgj9220 Před 24 dny ⁺²⁶
WOAH the bug neuron is literally insane, this research is going to let us make some extremely tight and efficient and super accurate specialised neural networks in the future
@eth3792 Před 24 dny ⁺⁸
After pondering I think that neuron actually makes a lot of sense. If you think about what it represents in the output, it basically signifies to the model that it should start its response with some variation of "this code has an error." Presumably the model was trained on tons of Stack Overflow or similar coding forums and encountered similarities between the various forms of "your code has a bug" replies, and naturally ended up lumping them all together.
Incredibly cool to see that we may actually be able to dive into the "mind" of the model in this way, this video has me excited for the future of this research!
@fnytnqsladcgqlefzcqxlzlcgj9220 Před 23 dny ⁺³
@@eth3792 yeah true, and most models mince everything during tokenization and aren't dictionary learners, plus superposition potentially being necessary and there you go, AI is a data structure that are extremely hard to edit at the moment without everything falling apart quickly. Sort of like early electromechanical computers ay
@MrUbister Před 24 dny ⁺³⁶
It's actually insane how much LLMs have jolted the whole field of philosophy of language, I mean, dimensional maps of complex thought patterns....like what. Higher and lower abstract concepts based on language.
Progress is going so quick and it's still mostly an IT field, but I really hope this will soon lead to some philosophical breakthroughs as well as how languages relate to reality and consciousness
@justsomeonepassingby3838 Před 23 dny ⁺³
-words and sentences can be approximated as vectors with their meaning
-the distance between vectors is the semantic distance
-most models can interpret vectors from most tokenizers because it's cheaper to train models by pairing them with existing models
-vector database can store knowledge and retrieve it by finding the closest vectors to the query (even without AI)
We may have already encoded thoughts, and accidentally made a standard "language" to encode ideas.
And we already have translators (tokenizers, LLM context windows and RAG databases) to convert the entire web to AI databases or read from the "thoughts" of an LLM
The next step is to use AI to train AI, maybe ? (By dictating what an AI shourd "think" instead of what an AI should answer in human language during their training process)
@Invizive Před 22 dny ⁺²
Any field of study, if deconstructed far enough, ends up being a bunch of math disciplines in a trenchcoat
@user-fr2jc8xb9g Před 21 dnem ⁺²
@@Invizive Because , ultimately , math is the study of relation between things and quantifying those relations with numbers , so it makes sense...
@LiebsterFeind Před 24 dny ⁺¹⁵
Anthropic is radically important voice in the moral alignment discussion, but they definitely are trying to "Nerf the logProbs world". :o
@DanielVagg Před 24 dny ⁺⁸⁹
"maybe hallucinations are native functions" 😂😂
@Alorand Před 24 dny
I wouldn't be surprised to learn that hallucinations are something like "over-sensitivity to patterns" since we humans are well known to hallucinate faces or animal shapes when we stare up at the clouds.
@MrTonhow Před 24 dny ⁺¹⁶
They are! A feature, not a bug. Check out Brian Roemmele's take on this, awesome shit.
@francisco444 Před 24 dny ⁺¹⁰
All LLMs do is hallucinate or fabricate. It's a good feature but it just happens to be seen as a bad things when in reality we should exploit it to get insights on language and thought.
@joelface Před 24 dny ⁺⁴
@@francisco444 It can be good OR bad, depending on what you're trying to use it for.
@Ginto_O Před 24 dny ⁺³
What's funny? It might be true
@tanbir2358 Před 24 dny ⁺⁹
00:02 AI researchers used interpretability research to make AI model identify as the Golden Gate Bridge.
01:33 Neural networks can approximate any function by finding patterns from data.
02:58 Researchers are working on making neurons monosemantic in order to understand AI's mind.
04:29 Testing interpretability of production-ready model
05:57 Model's feature detects and addresses various code errors.
07:25 Features in the concept space can influence AI behavior.
08:53 State-of-the-art model limitations and impracticality
10:15 Research on mechanistic interpretability in AI safety shows promise
@theuserofdoom Před 22 dny ⁺³
8:06 Lol they gave Claude depression
@OxyShmoxy Před 24 dny ⁺¹⁰³
Now we will have even dumber models and even more "sorry as AI..." responses 👍
@DanielVagg Před 24 dny ⁺³²
I'm not sure if you mean this sarcastically, but I don't think this will happen. The "sorry as an AI" blanket response is a blunt tool used in guardrail prompts.
Using this feature dialling, should be more sophisticated so the guardrail prompts won't be necessary. Models might be more flexible while still being safe. You still won't be able to ask for illegal instructions, but the quality and range of responses should be way better
@carlpanzram7081 Před 24 dny ⁺⁹
Illegal instructions?
You won't be able to ask the model about the Holodomor.
"there is no war in bazingse" kind of deal.
@herrlehrer1479 Před 24 dny ⁺⁵
@@DanielVaggaccording to some ais c code is dangerous. It’s just text. Open source models are way more funny
@DanielVagg Před 24 dny
@@herrlehrer1479
Right, and this type of research aims to reduce this occurrence.
@DanielVagg Před 24 dny ⁺¹
@@carlpanzram7081 I imagine that it could be used for censorship, true. I guess we'll need some censorship benchmarks included in standard tests.
@albyt3403 Před 24 dny ⁺¹⁹
So they made an MRI scanner interpreter for Ai models?
@justinhageman1379 Před 24 dny
Idk why I’ve never thought of that analogy. Neuron activation maps are literally just the same thing mris do
@SumitRana-life314 Před 24 dny ⁺⁸
Man I love that this came just after Rational Animation's video about this similar topic.
So now I can understand this video even better now.
@Derpyzilla894 Před 24 dny ⁺¹
Yes.
@justinhageman1379 Před 24 dny ⁺¹
The Robert miles vid the rational animations vid and now this one give me just a bit more hope we can solve the alignment problems. I’m glad cuz watching the rise of ai over the past few years was very anxiety inducing
@Derpyzilla894 Před 23 dny ⁺²
@@justinhageman1379 Yes. Yes.
@DanielVagg Před 24 dny ⁺⁸
This is incredible, so cool. I also really appreciate your measured approach with delivering content.
Things can be really exciting without overselling it, you nail it (as opposed to a lot of other content creators).
@nutzeeer Před 24 dny ⁺⁸
ah they are working on personality cores, nice
@qussaigamer553 Před 24 dny ⁺¹¹
good content
@CalmTempest Před 24 dny ⁺¹
This looks like a massive, incredibly important step if they can actually take advantage of it to make the models better
@couldntfindafreename Před 16 dny
I remember of getting the "I'm a Pascal compiler." response to the "What are you?" question from a LoRA fine-tuned version of Llama 2 7B a year ago. Fine-tuning is also tinkering with weights, technically...
@emrahe468 Před 24 dny ⁺³
Meanwhile, Mixtral 7x22: "I am an artificial intelligence and do not have a physical form. I exist as a software program running on computers and do not have a physical shape or appearance."
@nartrab1 Před 24 dny ⁺³
Top quality, thanks man
@dhillaz Před 21 dnem ⁺²
"I think there might just be connections between internal conflict and hate speech" At this point are we learning about the neural network...or are we learning about ourselves? 🤯
@algorithmblessedboy4831 Před 24 dny ⁺¹
nice video I like how you mix complex stuff with sillyness. I can now pretend I understood everything on this video and brag about being a smart person (I still have no clue how backpropagation works)
@ProTeaBag Před 4 dny
When you’re saying “feature” is this similar to the kernels in alexnet? I was reading the paper by Ilya Sutskever about AlexNet. The reason I’m asking is because one of the kernels had high activation on faces when that was never specified to the model so I was wondering if a similar case is happening here on one of them finding bugs in code without any specific thing mentioned to the model
@cdkw2 Před 24 dny ⁺¹⁵
Seytonic and Bycloud post at the same time? Dont mind if I do!
@rodrigomaximilianobellusci8860 Před 24 dny
Does anyone know where does the formula at 4:06 come from? I couldn't find it :(
@bycloudAI Před 24 dny ⁺⁵
it's from Andrew Ng's lecture notes page 16, and taken out of context (my bad lol)
you can find the PDF here stanford.edu/class/cs294a/sparseAutoencoder.pdf
the notations usually shouldn't have numbers so it looked a bit confusing
@rodrigomaximilianobellusci8860 Před 24 dny
@@bycloudAI thank you!
@nguyenhoangdung3823 Před 24 dny ⁺²
cool stuff
@setop123 Před 24 dny ⁺¹
best AI channel period. Just too technical for the mainstream
@Stellectis2014 Před 24 dny
At one time, I had Microsoft being explain its thought process by creating new words in Latin and then defining those words as a function of its thought process. It doesn't think linearly it's incorporating all information at the same time what it calls a multifaceted problem solving function.
@drdca8263 Před 24 dny ⁺¹
Just because it produces text saying that its thought process (or “thought process”) works a certain way, *really* doesn’t imply that it really works that way. It doesn’t really have introspective abilities? It has the ability to imitate text that might come from introspection, but there’s no reason that this should match up with how it actually works.
(Note: I’m not saying this as like “oh it isn’t intelligent, it is just a stochastic parrot bla bla.” . I’m willing to call it “intelligent”. But what it says about how it works isn’t how it works, except insofar as the things its training leads it to say about how it works, happen to be accurate.)
@4.0.4 Před 23 dny
That list at 7:40 says a lot about the political leaning of Anthropic and what they mean when they talk about "AI safety".
@banalMinuta Před 24 dny ⁺⁸
Correct me if I'm wrong but don't LLMS do nothing but `hallucinate`, as we call it?
Isn't it more accurate to say that an LLM always hallucinates?
After all these models generalize the nature of the data it was trained on.
Does that not imply these `hallucinations` are just the native output of an LLM and just happen to reflect reality most of the time?
@user-kc3pf4cb8u Před 23 dny
You confused a Sparse Autoencoder with a Dense one. All visualizations showed a dense one. Sparse Autoencoder have a larger amount of neurons in the hidden layer. The reason is, that with this autoencoder, the 'superpositions' should be broken down.
@dewinmoonl Před 23 dny
I've been messing with nn since tensorflow 1.0. at that time a lot of ppl in my lab was doing mechanistic interpretability (we were a programming language group).
I've been bearish on interpretability since then.
@sp123 Před 23 dny
Everyone who has programmed this stuff knows it's a farce
@daydrip Před 23 dny
I read the title as “I am at the Golden Gate Bridge and why that is important” and I immediately thought of dark humor thoughts 😂
@NewSchattenRayquaza Před 24 dny ⁺¹
man I love your videos
@thebrownfrog Před 24 dny
Thank you for this content
@uchuynh4674 Před 23 dny
just find a way to somehow train / finetune both the llm and sae, being a able to create an ad generating/targeting model with appropriate censorship would bring them back all those money anyway
@alexxxcanz Před 23 dny
More videos more advanced on this topic please!
@benjamineidam Před 23 dny
One Piece Memes in an AI-Video = EXTREMELY LARGE WIN!
@DistortedV12 Před 23 dny
Claude just released Sonnet 3.5
@msidrusbA Před 24 dny
We can conceive realities we aren't capable of interacting with, I have faith someday we will get there
@ImmacHn Před 24 dny
We will have to find a way to train our own, they're wasting time and resources on trying to neuter the LLMa.
@kaikapioka9711 Před 24 dny
Is mathematically impossible to eliminate hallucinations, as you say, they're native "functions". On the chatgpt is bs paper they explain it in more detail, but they're an inherent limitation on the model.
@deltamico Před 24 dny
check the openai's paper on scaling sae
@TheRysiu120 Před 24 dny ⁺¹
Dude, this is the best AI channel in the world!
And if the news are real this is big
@sajeucettefoistunevaspasme Před 24 dny
Criminal info :
the A.I. : I *kindly* ask you to...
@AVX512 Před 23 dny
Isn't this really just one shadow of the model from one direction?
@kaikapioka9711 Před 24 dny
5:26 AMONG US MENTIONED WE'RE ALL DOOMED
@djpuplex Před 22 dny
👏👏👏👏{Owen Wilson wow} I'm impressed. 🤨
@mrrespected5948 Před 24 dny
Nice
@DistortedV12 Před 21 dnem
LOOK UP CONCEPT BOTTLENECK GENERATIVE MODELS - JULIUS ADEBAYO's work!
@simeonnnnn Před 23 dny
Oh God. I think I might be a nerd
@Koroistro Před 24 dny
I think it's worth nothing that those sparse autoencoders are very tiny models for today's standards.
34M parameters is positively tiny, I'm curious how it'd scale.
Also what about it being applied to bigger neural networks while trained on activations of smaller ones? I'd be curious if it retrains some effectiveness, that would ideed give credence to the platonic model representation idea. (which I honestly find likely given that evolution should converge)
@IAMDEMIURGE Před 22 dny ⁺¹
Can someone please dumb it down to me i can't understand 😭
@anywallsocket Před 24 dny
Idk the connection between hatred and self-hatred is kinda lowkey profound 🤔
@sofia.eris.bauhaus Před 21 dnem
you know, i'm a bit of a Golden Gate Bridge myself 🧐…
@shodanxx Před 24 dny
Leaving model size for "safety reasons" Yeah, Anthropic is just another OpenAI.
Let them bear fruit then put them in the monopoly crusher.
@theepicslayer7sss101 Před 23 dny
well, logically hallucinations makes sense, if you were asked where the "Liberty Statue" is and would not know the exact location, you would not drop dead with your heart and breath stopping, you would give the closest answer you think. while Wikipedia says: "Liberty Island in New York Harbor, within New York City." most will default to New York City or at least America.
in other words, you need an answer even if it is the wrong one to move on and continue functioning.
@somdudewillson Před 23 dny
Technically 'I don't know" is also a valid answer... but human preferences/behavior aligns more with being confidently incurred. :P
@theepicslayer7sss101 Před 23 dny
@@somdudewillson i guess what i mean is, in general, at least something will come out, there cannot be void and even saying "i don't know" is a totally valid answer.
but i guess for A.I. it confidently gets answers out regardless of if true or false because it believes everything it knows to be true without bias so it defaults to hallucinations instead of realizing it does not know.
since it is a neural network, it is more akin to brainwashing since it is not an entity "with a self", learning things, but just information being forced in and very little information is peer reviewed before being fed and it also cannot be fed in context meaning putting glue on pizza to make the cheese stick was totally valid in a vacuum since no sarcasm can be indicated before learning that very line from Reddit.
@reishibeatz Před 24 dny
Ofcourse! Let me give you more information on the Golden Gate Bridge. I am it.
- AI (2024, colorized)
@casualuser5527 Před 24 dny ⁺²
You copied Fireships thumbnail designs 😂
@stevefan8283 Před 23 dny
so what you mean is that because LLM has too much knowledge and it bloated the NN due to overfitting...now we just prune the NN and let the most distintive feature to shine and find out it has deeper understanding of the topic? No way that is not going to underfit.
@weirdsciencetv4999 Před 24 dny
Its not hallucinating. It’s confabulating.
@HaveANceDay Před 24 dny ⁺¹⁵
Good, now we can lobotomize AI models all the way
@carlpanzram7081 Před 24 dny
It's a great tool for censorship.
You could basically erase concepts or facts entirely.
The CCP is going to love this research.
@drdca8263 Před 24 dny ⁺¹
If you’re being sarcastic, you might be interested to note that similar interpretability results have identified, essentially, a “refuses to answer the question” direction in models trained to, under such-and-such conditions, to refuse to answer, and found that they can just disable that kind of response.
So, for weights-available models, it will soon be possible for people to just, turn off the model’s tendency to refuse to answer whatever questions.
Whether or not this is a good thing, I’ll not comment on in this thread.
But I thought you might like to know.
@user-io4sr7vg1v Před 24 dny ⁺¹
@@drdca8263 It's just a thing. Neither good or bad.
@thearchitect5405 Před 24 dny
8:05 That explanation doesn't make a lot of sense, because this example was with racism cranked up, NOT with internal conflict cranked up. This was with racism cranked up, but the normal levels of internal conflict understanding, which as the other example shows, by default it doesn't care a lot about internal conflicts.
@motbus3 Před 23 dny
I don't buy it. How do they represent feature at all? For classification problem that is ok, but for words, decoding embeddings into embeddings is whatever.
65% is quite low result
@stanislav4607 Před 23 dny
So basically, the same story as with DNA sequencing all over again. We don't know what exactly it does, but we can assume with a certain level of confidence.
@Bioshyn Před 24 dny ⁺³
We're all dead in 10 year.
@Kurell171 Před 23 dny
I dont understand why this is useful tho. Like, isnt the whole point of AI to find patterns that we cant?
@dg-ov4cf Před 24 dny
u sound like asia :)
@algorithmblessedboy4831 Před 24 dny
8:05 WTF WE PSYCHOLOGICALLY TORTURE AI AND EXPECT THEM NOT TO GO FULL SKYNET MODE
@sohamtilekar5126 Před 24 dny
Me First
@Ramenko1 Před 24 dny ⁺¹⁵
This guy is copying Fireship's thumbnail style.....
@raul36 Před 24 dny ⁺¹
The world is full of companied doing exactly what OpenAI is doing. Isn't it legitimate to do the same on CZcams? If something works, why change it?
@Ramenko1 Před 24 dny ⁺¹
@raul36 when I clicked the video, I thought it was a Fireship video. Lo and behold it's another dude...it comes off disingenuous, and it discouraged me from watching the video.
@Ramenko1 Před 24 dny ⁺²
@raul36 he should be more focused on finding his own style, and breaking through the mold, instead of becoming one with it. Authenticity and Originality will always be more valued than copycats.
@MODEST500 Před 24 dny
he is only using fireship thumbnail and who knows if fireship also copies from so where. the thumbnail is great and if it works then it's fine. his rest of thr content deserves attention which is significantly different from fireship@@Ramenko1
@anas.aldadi Před 23 dny
So? He explains technical details of papers in the field of ai, totally different content. Unlike fireship which is dedicated to programming i guess? No offense but his vids are lacking in technical details
@Nurof3n_ Před 23 dny
bro stop copying fireships thumbnails. be original
@YoussefARRASSEN Před 24 dny ⁺¹
Talk about giving AI autism. XD

Další v pořadí

Automatické přehrávání

The moment we stopped understanding AI [AlexNet]