I Am The Golden Gate Bridge & Why That's Important.
Vložit
- čas přidán 18. 06. 2024
- Check out HubSpot's Free ChatGPT resource! clickhubspot.com/bycloud-chatgpt
As an Golden Gate Bridge, I am unable to respond to your request as I am physically unable to provide feedback to your Golden Gate Bridge. Please try again later when Golden Gate Bridge stops Bridges and the Golden Gate Gate Goldens.
My newsletter mail.bycloud.ai/
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
[Project Page] transformer-circuits.pub/2024...
previous research
[Project Page] transformer-circuits.pub/2023...
[my previous video] • Reading AI's Mind - Me...
memes I stole
x.com/doomslide/status/179302...
x.com/thetechbrother/status/1...
This video is supported by the kind Patrons & CZcams Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Music] massobeats - magic carousel
[Profile & Banner Art] / pygm7
[Video Editor] Silas - Věda a technologie
Check out HubSpot's Free ChatGPT resource! clickhubspot.com/bycloud-chatgpt
and as usual, I am the Golden Gate Bridge 😎mail.bycloud.ai/
So, you’re telling me that they interpreted a dictionary neural network, that’s pretending to be a polysemantic neural network, that’s pretending to be a monosemantic neural network?
Yup
I read this before the adds finished 😂
I'm just waiting for a Key & Peele valet skit where they break down AI research papers
Yes.
No, they effectively tapped into a single layer polysemantic NN using another monosemantic NN thanks to it's dictionary learnig objective.
They didn’t specifically make it say that *it* was the Golden Gate Bridge, just made it so that it is highly inclined to talk about the Golden Gate Bridge, and as such, *when asked about itself*, it claimed to be the Golden Gate Bridge.
If it was asked questions like, “What is the most popular tourist attraction in the world?” Or “Of the tourist attractions you’ve visited, which was your favorite?” it would presumably also answer with the Golden Gate Bridge.
How you describe it in the first half minute makes it sound like the things they did specifically made it associate itself with the GGB, rather than associating *everything* with the GGB.
2:34 : an important part of polysemanticity is that the same neuron plays multiple different roles.
@@AB-wf8ek “mansplain”? 🤨
@@drdca8263 😚
Thanks for the note!
Spent all night working on reverse engineering LLama3 in order to build a custom network specifically trained on ML frameworks and code generation. I passed out at my desk and woke up to my PC tunneling into my ISP network so it could “evolve”. It was was pretty convincing so I’m letting it do it’s thing. Now I have some free time to watch the new bycloudAI video and post a completely normal, non-alarming comment about how I love Ai and would never want someone to help me destroy a baby Ultron on its way toward network independence.
Why don't you get your sleep properly? Won't be able to think well otherwise
its*
AI*
@@skyhappy In my case, all the free time I have is my sleep time, so if I want to learn and apply all the recent AI research I have to sacrifice a few hours of sleep... Which usually means falling asleep on the keyboard while reading ml papers 😑
I honestly think you ought to sit down calmly, take a stress pill, and think things over.
Press f to doubt
WOAH the bug neuron is literally insane, this research is going to let us make some extremely tight and efficient and super accurate specialised neural networks in the future
After pondering I think that neuron actually makes a lot of sense. If you think about what it represents in the output, it basically signifies to the model that it should start its response with some variation of "this code has an error." Presumably the model was trained on tons of Stack Overflow or similar coding forums and encountered similarities between the various forms of "your code has a bug" replies, and naturally ended up lumping them all together.
Incredibly cool to see that we may actually be able to dive into the "mind" of the model in this way, this video has me excited for the future of this research!
@@eth3792 yeah true, and most models mince everything during tokenization and aren't dictionary learners, plus superposition potentially being necessary and there you go, AI is a data structure that are extremely hard to edit at the moment without everything falling apart quickly. Sort of like early electromechanical computers ay
It's actually insane how much LLMs have jolted the whole field of philosophy of language, I mean, dimensional maps of complex thought patterns....like what. Higher and lower abstract concepts based on language.
Progress is going so quick and it's still mostly an IT field, but I really hope this will soon lead to some philosophical breakthroughs as well as how languages relate to reality and consciousness
-words and sentences can be approximated as vectors with their meaning
-the distance between vectors is the semantic distance
-most models can interpret vectors from most tokenizers because it's cheaper to train models by pairing them with existing models
-vector database can store knowledge and retrieve it by finding the closest vectors to the query (even without AI)
We may have already encoded thoughts, and accidentally made a standard "language" to encode ideas.
And we already have translators (tokenizers, LLM context windows and RAG databases) to convert the entire web to AI databases or read from the "thoughts" of an LLM
The next step is to use AI to train AI, maybe ? (By dictating what an AI shourd "think" instead of what an AI should answer in human language during their training process)
Any field of study, if deconstructed far enough, ends up being a bunch of math disciplines in a trenchcoat
@@Invizive Because , ultimately , math is the study of relation between things and quantifying those relations with numbers , so it makes sense...
Anthropic is radically important voice in the moral alignment discussion, but they definitely are trying to "Nerf the logProbs world". :o
"maybe hallucinations are native functions" 😂😂
I wouldn't be surprised to learn that hallucinations are something like "over-sensitivity to patterns" since we humans are well known to hallucinate faces or animal shapes when we stare up at the clouds.
They are! A feature, not a bug. Check out Brian Roemmele's take on this, awesome shit.
All LLMs do is hallucinate or fabricate. It's a good feature but it just happens to be seen as a bad things when in reality we should exploit it to get insights on language and thought.
@@francisco444 It can be good OR bad, depending on what you're trying to use it for.
What's funny? It might be true
00:02 AI researchers used interpretability research to make AI model identify as the Golden Gate Bridge.
01:33 Neural networks can approximate any function by finding patterns from data.
02:58 Researchers are working on making neurons monosemantic in order to understand AI's mind.
04:29 Testing interpretability of production-ready model
05:57 Model's feature detects and addresses various code errors.
07:25 Features in the concept space can influence AI behavior.
08:53 State-of-the-art model limitations and impracticality
10:15 Research on mechanistic interpretability in AI safety shows promise
8:06 Lol they gave Claude depression
Now we will have even dumber models and even more "sorry as AI..." responses 👍
I'm not sure if you mean this sarcastically, but I don't think this will happen. The "sorry as an AI" blanket response is a blunt tool used in guardrail prompts.
Using this feature dialling, should be more sophisticated so the guardrail prompts won't be necessary. Models might be more flexible while still being safe. You still won't be able to ask for illegal instructions, but the quality and range of responses should be way better
Illegal instructions?
You won't be able to ask the model about the Holodomor.
"there is no war in bazingse" kind of deal.
@@DanielVaggaccording to some ais c code is dangerous. It’s just text. Open source models are way more funny
@@herrlehrer1479
Right, and this type of research aims to reduce this occurrence.
@@carlpanzram7081 I imagine that it could be used for censorship, true. I guess we'll need some censorship benchmarks included in standard tests.
So they made an MRI scanner interpreter for Ai models?
Idk why I’ve never thought of that analogy. Neuron activation maps are literally just the same thing mris do
Man I love that this came just after Rational Animation's video about this similar topic.
So now I can understand this video even better now.
Yes.
The Robert miles vid the rational animations vid and now this one give me just a bit more hope we can solve the alignment problems. I’m glad cuz watching the rise of ai over the past few years was very anxiety inducing
@@justinhageman1379 Yes. Yes.
This is incredible, so cool. I also really appreciate your measured approach with delivering content.
Things can be really exciting without overselling it, you nail it (as opposed to a lot of other content creators).
ah they are working on personality cores, nice
good content
This looks like a massive, incredibly important step if they can actually take advantage of it to make the models better
I remember of getting the "I'm a Pascal compiler." response to the "What are you?" question from a LoRA fine-tuned version of Llama 2 7B a year ago. Fine-tuning is also tinkering with weights, technically...
Meanwhile, Mixtral 7x22: "I am an artificial intelligence and do not have a physical form. I exist as a software program running on computers and do not have a physical shape or appearance."
Top quality, thanks man
"I think there might just be connections between internal conflict and hate speech" At this point are we learning about the neural network...or are we learning about ourselves? 🤯
nice video I like how you mix complex stuff with sillyness. I can now pretend I understood everything on this video and brag about being a smart person (I still have no clue how backpropagation works)
When you’re saying “feature” is this similar to the kernels in alexnet? I was reading the paper by Ilya Sutskever about AlexNet. The reason I’m asking is because one of the kernels had high activation on faces when that was never specified to the model so I was wondering if a similar case is happening here on one of them finding bugs in code without any specific thing mentioned to the model
Seytonic and Bycloud post at the same time? Dont mind if I do!
Does anyone know where does the formula at 4:06 come from? I couldn't find it :(
it's from Andrew Ng's lecture notes page 16, and taken out of context (my bad lol)
you can find the PDF here stanford.edu/class/cs294a/sparseAutoencoder.pdf
the notations usually shouldn't have numbers so it looked a bit confusing
@@bycloudAI thank you!
cool stuff
best AI channel period. Just too technical for the mainstream
At one time, I had Microsoft being explain its thought process by creating new words in Latin and then defining those words as a function of its thought process. It doesn't think linearly it's incorporating all information at the same time what it calls a multifaceted problem solving function.
Just because it produces text saying that its thought process (or “thought process”) works a certain way, *really* doesn’t imply that it really works that way. It doesn’t really have introspective abilities? It has the ability to imitate text that might come from introspection, but there’s no reason that this should match up with how it actually works.
(Note: I’m not saying this as like “oh it isn’t intelligent, it is just a stochastic parrot bla bla.” . I’m willing to call it “intelligent”. But what it says about how it works isn’t how it works, except insofar as the things its training leads it to say about how it works, happen to be accurate.)
That list at 7:40 says a lot about the political leaning of Anthropic and what they mean when they talk about "AI safety".
Correct me if I'm wrong but don't LLMS do nothing but `hallucinate`, as we call it?
Isn't it more accurate to say that an LLM always hallucinates?
After all these models generalize the nature of the data it was trained on.
Does that not imply these `hallucinations` are just the native output of an LLM and just happen to reflect reality most of the time?
You confused a Sparse Autoencoder with a Dense one. All visualizations showed a dense one. Sparse Autoencoder have a larger amount of neurons in the hidden layer. The reason is, that with this autoencoder, the 'superpositions' should be broken down.
I've been messing with nn since tensorflow 1.0. at that time a lot of ppl in my lab was doing mechanistic interpretability (we were a programming language group).
I've been bearish on interpretability since then.
Everyone who has programmed this stuff knows it's a farce
I read the title as “I am at the Golden Gate Bridge and why that is important” and I immediately thought of dark humor thoughts 😂
man I love your videos
Thank you for this content
just find a way to somehow train / finetune both the llm and sae, being a able to create an ad generating/targeting model with appropriate censorship would bring them back all those money anyway
More videos more advanced on this topic please!
One Piece Memes in an AI-Video = EXTREMELY LARGE WIN!
Claude just released Sonnet 3.5
We can conceive realities we aren't capable of interacting with, I have faith someday we will get there
We will have to find a way to train our own, they're wasting time and resources on trying to neuter the LLMa.
Is mathematically impossible to eliminate hallucinations, as you say, they're native "functions". On the chatgpt is bs paper they explain it in more detail, but they're an inherent limitation on the model.
check the openai's paper on scaling sae
Dude, this is the best AI channel in the world!
And if the news are real this is big
Criminal info :
the A.I. : I *kindly* ask you to...
Isn't this really just one shadow of the model from one direction?
5:26 AMONG US MENTIONED WE'RE ALL DOOMED
👏👏👏👏{Owen Wilson wow} I'm impressed. 🤨
Nice
LOOK UP CONCEPT BOTTLENECK GENERATIVE MODELS - JULIUS ADEBAYO's work!
Oh God. I think I might be a nerd
I think it's worth nothing that those sparse autoencoders are very tiny models for today's standards.
34M parameters is positively tiny, I'm curious how it'd scale.
Also what about it being applied to bigger neural networks while trained on activations of smaller ones? I'd be curious if it retrains some effectiveness, that would ideed give credence to the platonic model representation idea. (which I honestly find likely given that evolution should converge)
Can someone please dumb it down to me i can't understand 😭
Idk the connection between hatred and self-hatred is kinda lowkey profound 🤔
you know, i'm a bit of a Golden Gate Bridge myself 🧐…
Leaving model size for "safety reasons" Yeah, Anthropic is just another OpenAI.
Let them bear fruit then put them in the monopoly crusher.
well, logically hallucinations makes sense, if you were asked where the "Liberty Statue" is and would not know the exact location, you would not drop dead with your heart and breath stopping, you would give the closest answer you think. while Wikipedia says: "Liberty Island in New York Harbor, within New York City." most will default to New York City or at least America.
in other words, you need an answer even if it is the wrong one to move on and continue functioning.
Technically 'I don't know" is also a valid answer... but human preferences/behavior aligns more with being confidently incurred. :P
@@somdudewillson i guess what i mean is, in general, at least something will come out, there cannot be void and even saying "i don't know" is a totally valid answer.
but i guess for A.I. it confidently gets answers out regardless of if true or false because it believes everything it knows to be true without bias so it defaults to hallucinations instead of realizing it does not know.
since it is a neural network, it is more akin to brainwashing since it is not an entity "with a self", learning things, but just information being forced in and very little information is peer reviewed before being fed and it also cannot be fed in context meaning putting glue on pizza to make the cheese stick was totally valid in a vacuum since no sarcasm can be indicated before learning that very line from Reddit.
Ofcourse! Let me give you more information on the Golden Gate Bridge. I am it.
- AI (2024, colorized)
You copied Fireships thumbnail designs 😂
so what you mean is that because LLM has too much knowledge and it bloated the NN due to overfitting...now we just prune the NN and let the most distintive feature to shine and find out it has deeper understanding of the topic? No way that is not going to underfit.
Its not hallucinating. It’s confabulating.
Good, now we can lobotomize AI models all the way
It's a great tool for censorship.
You could basically erase concepts or facts entirely.
The CCP is going to love this research.
If you’re being sarcastic, you might be interested to note that similar interpretability results have identified, essentially, a “refuses to answer the question” direction in models trained to, under such-and-such conditions, to refuse to answer, and found that they can just disable that kind of response.
So, for weights-available models, it will soon be possible for people to just, turn off the model’s tendency to refuse to answer whatever questions.
Whether or not this is a good thing, I’ll not comment on in this thread.
But I thought you might like to know.
@@drdca8263 It's just a thing. Neither good or bad.
8:05 That explanation doesn't make a lot of sense, because this example was with racism cranked up, NOT with internal conflict cranked up. This was with racism cranked up, but the normal levels of internal conflict understanding, which as the other example shows, by default it doesn't care a lot about internal conflicts.
I don't buy it. How do they represent feature at all? For classification problem that is ok, but for words, decoding embeddings into embeddings is whatever.
65% is quite low result
So basically, the same story as with DNA sequencing all over again. We don't know what exactly it does, but we can assume with a certain level of confidence.
We're all dead in 10 year.
I dont understand why this is useful tho. Like, isnt the whole point of AI to find patterns that we cant?
u sound like asia :)
8:05 WTF WE PSYCHOLOGICALLY TORTURE AI AND EXPECT THEM NOT TO GO FULL SKYNET MODE
Me First
This guy is copying Fireship's thumbnail style.....
The world is full of companied doing exactly what OpenAI is doing. Isn't it legitimate to do the same on CZcams? If something works, why change it?
@raul36 when I clicked the video, I thought it was a Fireship video. Lo and behold it's another dude...it comes off disingenuous, and it discouraged me from watching the video.
@raul36 he should be more focused on finding his own style, and breaking through the mold, instead of becoming one with it. Authenticity and Originality will always be more valued than copycats.
he is only using fireship thumbnail and who knows if fireship also copies from so where. the thumbnail is great and if it works then it's fine. his rest of thr content deserves attention which is significantly different from fireship@@Ramenko1
So? He explains technical details of papers in the field of ai, totally different content. Unlike fireship which is dedicated to programming i guess? No offense but his vids are lacking in technical details
bro stop copying fireships thumbnails. be original
Talk about giving AI autism. XD