175
498 551 867

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

27:14

4 questions about the refractive index | Optics puzzles 4

13:25

But why would light "slow down"? | Optics puzzles 3

29:24

25 Math explainers you may enjoy | SoME3 results

22:12

Explaining the barber pole effect from origins of light | Optics puzzles 2

21:33

Polarized light in sugar water | Optics puzzles 1

9:57

Attention in transformers, visually explained | Chapter 6, Deep Learning

Demystifying attention, the key mechanism inside transformers and LLMs.
Instead of sponsored ad reads, these lessons are funded directly by viewers: 3b1b.co/support
Special thanks to these supporters: www.3blue1brown.com/lessons/attention#thanks
An equally valuable form of support is to simply share the videos.
Demystifying self-attention, multiple heads, and cross-attention.
Instead of sponsored ad reads, these lessons are funded directly by viewers: 3b1b.co/support
The first pass for the translated subtitles here is machine-generated, and therefore notably imperfect. To contribute edits or fixes, visit translate.3blue1brown.com/
And yes, at 22:00 (and elsewhere), "breaks" is a typo.
------------------
Here are a few other relevant resources
Build a GPT from scratch, by Andrej Karpathy
czcams.com/video/kCc8FmEb1nY/video.html
If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the topic:
czcams.com/video/1il-s4mgNdI/video.htmlsi=XaVxj6bsdy3VkgEX
If you're interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from the embedding space to itself, which, at least in my mind, made things much clearer than other sources.
transformer-circuits.pub/2021/framework/index.html
Site with exercises related to ML programming and GPTs
www.gptandchill.ai/codingproblems
History of language models by Brit Cruise, @ArtOfTheProblem
czcams.com/video/OFS90-FX6pg/video.html
An early paper on how directions in embedding spaces have meaning:
arxiv.org/pdf/1301.3781.pdf
------------------
Timestamps:
0:00 - Recap on embeddings
1:39 - Motivating examples
4:29 - The attention pattern
11:08 - Masking
12:42 - Context size
13:10 - Values
15:44 - Counting parameters
18:21 - Cross-attention
19:19 - Multiple heads
22:16 - The output matrix
23:19 - Going deeper
24:54 - Ending
------------------
These animations are largely made using a custom Python library, manim. See the FAQ comments here:
3b1b.co/faq#manim
github.com/3b1b/manim
github.com/ManimCommunity/manim/
All code for specific videos is visible here:
github.com/3b1b/videos/
The music is by Vincent Rubinetti.
www.vincentrubinetti.com
vincerubinetti.bandcamp.com/album/the-music-of-3blue1brown
open.spotify.com/album/1dVyjwS8FBqXhRunaG5W5u
------------------
3blue1brown is a channel about animating math, in all senses of the word animate. If you're reading the bottom of a video description, I'm guessing you're more interested than the average viewer in lessons here. It would mean a lot to me if you chose to stay up to date on new ones, either by subscribing here on CZcams or otherwise following on whichever platform below you check most regularly.
Mailing list: 3blue1brown.substack.com
Twitter: 3blue1brown
Instagram: 3blue1brown
Reddit: www.reddit.com/r/3blue1brown
Facebook: 3blue1brown
Patreon: patreon.com/3blue1brown
Website: www.3blue1brown.com

zhlédnutí: 840 279

Video

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

27:14

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

zhlédnutí 2,2MPřed měsícem

Unpacking how large language models work under the hood Early view of the next chapter for patrons: 3b1b.co/early-attention Special thanks to these supporters: 3b1b.co/lessons/gpt#thanks To contribute edits to the subtitles, visit translate.3blue1brown.com/ Other recommended resources on the topic. Richard Turner's introduction is one of the best starting places: arxiv.org/pdf/2304.10557.pdf Co...

$4 questions about the refractive index | Optics puzzles 4$ 13:25 $4 questions about the refractive index | Optics puzzles 4$

4 questions about the refractive index | Optics puzzles 4

zhlédnutí 644KPřed 4 měsíci

4 questions about the refractive index | Optics puzzles 4

But why would light "slow down"? | Optics puzzles 3

29:24

But why would light "slow down"? | Optics puzzles 3

zhlédnutí 1,2MPřed 5 měsíci

But why would light "slow down"? | Optics puzzles 3

25 Math explainers you may enjoy | SoME3 results

22:12

25 Math explainers you may enjoy | SoME3 results

zhlédnutí 542KPřed 6 měsíci

25 Math explainers you may enjoy | SoME3 results

Explaining the barber pole effect from origins of light | Optics puzzles 2

21:33

Explaining the barber pole effect from origins of light | Optics puzzles 2

zhlédnutí 674KPřed 8 měsíci

Explaining the barber pole effect from origins of light | Optics puzzles 2

Polarized light in sugar water | Optics puzzles 1

9:57

Polarized light in sugar water | Optics puzzles 1

zhlédnutí 998KPřed 8 měsíci

Polarized light in sugar water | Optics puzzles 1

A pretty reason why Gaussian + Gaussian = Gaussian

13:16

A pretty reason why Gaussian + Gaussian = Gaussian

zhlédnutí 751KPřed 9 měsíci

A pretty reason why Gaussian Gaussian = Gaussian

This pattern breaks, but for a good reason | Moser's circle problem

16:13

This pattern breaks, but for a good reason | Moser's circle problem

zhlédnutí 1,9MPřed 10 měsíci

This pattern breaks, but for a good reason | Moser's circle problem

How They Fool Ya (live) | Math parody of Hallelujah

4:00

How They Fool Ya (live) | Math parody of Hallelujah

zhlédnutí 940KPřed 10 měsíci

How They Fool Ya (live) | Math parody of Hallelujah

Convolutions | Why X+Y in probability is a beautiful mess

27:25

Convolutions | Why X+Y in probability is a beautiful mess

zhlédnutí 625KPřed 10 měsíci

Convolutions | Why X Y in probability is a beautiful mess

Why π is in the normal distribution (beyond integral tricks)

24:46

Why π is in the normal distribution (beyond integral tricks)

zhlédnutí 1,5MPřed rokem

Why π is in the normal distribution (beyond integral tricks)

31:15

But what is the Central Limit Theorem?

zhlédnutí 3,3MPřed rokem

But what is the Central Limit Theorem?

23:01

But what is a convolution?

zhlédnutí 2,5MPřed rokem

But what is a convolution?

Researchers thought this was a bug (Borwein integrals)

17:26

Researchers thought this was a bug (Borwein integrals)

zhlédnutí 3,3MPřed rokem

Researchers thought this was a bug (Borwein integrals)

What makes a great math explanation? | SoME2 results

17:01

What makes a great math explanation? | SoME2 results

zhlédnutí 735KPřed rokem

What makes a great math explanation? | SoME2 results

18:49

How to lie using visual proofs

zhlédnutí 3,1MPřed rokem

How to lie using visual proofs

Olympiad level counting (Generating functions)

34:36

Olympiad level counting (Generating functions)

zhlédnutí 1,9MPřed rokem

Olympiad level counting (Generating functions)

Oh, wait, actually the best Wordle opener is not “crane”…

10:53

Oh, wait, actually the best Wordle opener is not “crane”…

zhlédnutí 6MPřed 2 lety

Oh, wait, actually the best Wordle opener is not “crane”…

30:38

Solving Wordle using information theory

zhlédnutí 10MPřed 2 lety

Solving Wordle using information theory

A tale of two problem solvers (Average cube shadows)

40:06

A tale of two problem solvers (Average cube shadows)

zhlédnutí 2,7MPřed 2 lety

A tale of two problem solvers (Average cube shadows)

12:40

2021 Summer of Math Exposition results

zhlédnutí 776KPřed 2 lety

2021 Summer of Math Exposition results

Beyond the Mandelbrot set, an intro to holomorphic dynamics

27:36

Beyond the Mandelbrot set, an intro to holomorphic dynamics

zhlédnutí 1,4MPřed 2 lety

Beyond the Mandelbrot set, an intro to holomorphic dynamics

$From Newton’s method to Newton’s fractal (which Newton knew nothing about)$ 26:06 $From Newton’s method to Newton’s fractal (which Newton knew nothing about)$

From Newton’s method to Newton’s fractal (which Newton knew nothing about)

zhlédnutí 2,8MPřed 2 lety

From Newton’s method to Newton’s fractal (which Newton knew nothing about)

24:21

The Summer of Math Exposition

zhlédnutí 721KPřed 2 lety

The Summer of Math Exposition

A quick trick for computing eigenvalues | Chapter 15, Essence of linear algebra

13:13

A quick trick for computing eigenvalues | Chapter 15, Essence of linear algebra

zhlédnutí 978KPřed 2 lety

A quick trick for computing eigenvalues | Chapter 15, Essence of linear algebra

How (and why) to raise e to the power of a matrix | DE6

27:07

How (and why) to raise e to the power of a matrix | DE6

zhlédnutí 2,7MPřed 3 lety

How (and why) to raise e to the power of a matrix | DE6

The medical test paradox, and redesigning Bayes' rule

21:14

The medical test paradox, and redesigning Bayes' rule

zhlédnutí 1,2MPřed 3 lety

The medical test paradox, and redesigning Bayes' rule

Hamming codes part 2: The one-line implementation

16:50

Hamming codes part 2: The one-line implementation

zhlédnutí 834KPřed 3 lety

Hamming codes part 2: The one-line implementation

But what are Hamming codes? The origin of error correction

20:05

But what are Hamming codes? The origin of error correction

zhlédnutí 2,3MPřed 3 lety

But what are Hamming codes? The origin of error correction

Komentáře

@chezlizzle Před 2 hodinami
Highly recommend for any classical mechanic enthusiasts. Great video.
@NemripNGC Před 3 hodinami
7130th comment
@randomadvice2487 Před 3 hodinami
Grant is the Satoshi of AI, but not...he's present.
@moviechilltime123 Před 3 hodinami
Sorry if I'm being ignorant, but what exactly are the "charges?" I have a learning disability so sometimes i miss things even after watching several times
@boruut2909 Před 3 hodinami
1, 2, 4, 8, 16 is a number sequence typically used by IQ tests. I wonder what is the correct extrapolated next number. It can actually be anything.
@raidtheferry Před 4 hodinami
Hey 3b1b, what sort of math interactive software do you use to create these amazing animations? They're awesome! You've been one of my favorite YT channels for years now and I've always wondered how it's done because I can't imagine you or someone else is doing them all by hand in the adobe suite... thx.
@hWat-Ever Před 4 hodinami
In base2 π is 11.001 Your 16kg weight is 1000 And there are 1100 bounces Your 64kg weight is 100000 And there are 11001 bounces In base4 π is 3.02 Your 16kg weight is 10 And there are 30 bounces In base8 π is 3.11 Your 64kg weight is 100 And there are 31 bounces
@arnaldoleon1 Před 5 hodinami
I got nothing done at work today as I spent it all day watching your videos.
@chromosundrift Před 5 hodinami
Huge long term fan but this series is my favourite.
@arnaldoleon1 Před 6 hodinami
This is absolutely brilliant. Thank you so much
@paramrajsingh1539 Před 6 hodinami
e and π have a cameo almost everywhere
@HAL-qu2ix Před 6 hodinami
Thank you for explaining this better than anyone else has been able to. I think I finally get it. I really appreciate your content 🙌🏻
@alextsun7314 Před 6 hodinami
I don't usually comment on videos, but this is one of the best videos I've seen on transformers, extremely detailed but very easy to understand!
@AlejandroVales Před 6 hodinami
This is actually similar to how some IQ tests work... Just trying to see how used you are to creating association patterns out of data they put out... like Finger is to hand, what leaf is to … Twig Tree Forest
@damianzieba5133 Před 7 hodinami
That's... just insane
@falion609 Před 7 hodinami
I REMEMBER THIS
@jeremyhansen9197 Před 8 hodinami
If discreet means probability and continuous means probability density, what of we to say about the possiblity of a probability density being gaussian?
@omgdorkness Před 8 hodinami
I need you to softmax my logits, baby.
@jercki72 Před 8 hodinami
ahaa now for sure people found the full video link button
@siddharthannandhakumar6187 Před 8 hodinami
I think it still holds true even when three lines meet at a point with the fact that the area formed by them is 0.
@andresmunchgallardo1383 Před 8 hodinami
“you, the 3-d lander” makes me feel like hes a 4d entity teaching me his version of toddler math
@excaliburhead Před 9 hodinami
I still don’t get it 🤷‍♂️
@gregorymathews1998 Před 9 hodinami
Love this channel but the flash bang at the end blew my pupils out
@tizmemc Před 9 hodinami
I was mentally and physically abused by my father as a child, currently living a financially decrepit and this will probably continue for the forseeable future, and i still cant believe out of wverything that has ever happened to me, this is what i bet my life on and lose
@HarpaAI Před 9 hodinami
🎯 Key Takeaways for quick navigation: 00:00 *🔍 Understanding the Attention Mechanism in Transformers* - Introduction to the attention mechanism and its significance in large language models. - Overview of the goal of transformer models to predict the next word in a piece of text. - Explanation of breaking text into tokens, associating tokens with vectors, and the use of high-dimensional embeddings to encode semantic meaning. 02:11 *🧠 Contextual meaning refinement in Transformers* - Illustration of how attention mechanisms refine embeddings to encode rich contextual meaning. - Examples showcasing the updating of word embeddings based on context. - Importance of attention blocks in enriching word embeddings with contextual information. 05:37 *⚙️ Matrix operations and weighted sum in Attention* - Explanation of matrix-vector products and tunable weights in matrix operations. - Introduction to the concept of masked attention for preventing later tokens from influencing earlier ones. - Overview of attention patterns, softmax computations, and relevance weighting in attention mechanisms. 21:31 *🧠 Multi-Headed Attention Mechanism in Transformers* - Explanation of how each attention head has distinct value matrices for producing value vectors. - Introduction to the process of summing proposed changes from different heads to refine embeddings in each position. - Importance of running multiple heads in parallel to capture diverse contextual meanings efficiently. 22:34 *🛠️ Technical Details in Implementing Value Matrices* - Description of the implementation difference in the value matrices as a single output matrix. - Clarification regarding technical nuances in how value matrices are structured in practice. - Noting the distinction between value down and value up matrices commonly seen in papers and implementations. 24:03 *💡 Embedding Nuances and Capacity for Higher-Level Encoding* - Discussion on how embeddings become more nuanced as data flows through multiple transformers and layers. - Exploration of the capacity of transformers to encode complex concepts beyond surface-level descriptors. - Overview of the network parameters associated with attention heads and the total parameters devoted to the entire transformer model. Made with HARPA AI
@SergeyYudintsev Před 9 hodinami
It’s insane. After watching a video with Numberfile, I actually did the exact same thing, proved that it doesn’t work for 3, moved on to 4 and stopped at coloring cause I’m dumb and couldn’t figure out the coloring
@ondrejbrichnac1813 Před 10 hodinami
Why does it hurt my balls when it do the ,,🦆" sound
@AquaTeenHungerForce_4_Life Před 10 hodinami
I’m amazed that a man in the 1800s understood this and was able to explain this all before computers and quantum mechanics. I also get a kick out of the naysayers like Kelvin. 😊
@piotrmazgaj Před 10 hodinami
<*_> This is my seal. I have watched the entire video, understood it, and I can explain it in my own words, thus I have gained knowledge. This is my seal. <_*>
@DrDec0 Před 10 hodinami
Ever calculated against eternity? You know what a circle is in other means and what a perfect circle defines? And you know what Pi was made to calculate? Now you know what number of collisions you will get the more you increase the mass of the right object to near eternity. 😘 And after you know it, eat more apples 😘
@thomasschodt7691 Před 10 hodinami
The last point on the perimeter needs to divide the segment not at the halfway point, but offset, say 1/3 and 2/3, creating a small figure at the centre of the circle - voila 32...
@user-wo6qn3vf9n Před 10 hodinami
The Fouriel transformer is different to normal transformers as instead of inline and adjacent cores and coils it is a 4 dimensional transformer consisting of 4 cores/coils at 390 degs to each other. This is much more economical than standard transformers as there is a lot less waste in heat as the electrical and magnetic waves don't interfere with each other while still inducing into each others cores. They are mainly used in Locomotive traction motors where the less heat produced reduces back emf, this was not a problem with weak fielding equipment and DC motors. With modern high voltage AC motors the heat factor is important so as much power can be driven for maximum speed.
@piotrmazgaj Před 10 hodinami
<*_> This is my seal. I have watched the entire video, understood it, and I can explain it in my own words, thus I have gained knowledge. This is my seal. <_*>
@alexjaybrady Před 11 hodinami
Linquistic thermodynamics??
@user-ce1nq3mo6j Před 11 hodinami
Fine
@VandanaTripathi-hn2ix Před 11 hodinami
If the triangle was isosceles D and P would have coincided
@niktrip Před 11 hodinami
I have a fear of looking at fractals being zoomed in and I still study them, but I can't watch them
@kurchak Před 11 hodinami
@57:18 well I am great at using compasses but not great at math. I guess ya win some ya lose some lol.
@Will-fj9gy Před 12 hodinami
This is terrifying
@Stanley-Wallice Před 12 hodinami
what is this supposed to be? edgy or something? are you having a stroke?
@maxwvm7345 Před 12 hodinami
i love this series. i did a lot of mallicious promt trial and error; but learning more about the mathematics behind it, i get to understand how some things might work.
@GoosebumpsOrg Před 12 hodinami
9:36 - 3Violet 1Brown !!
@ujjwalyadav6189 Před 12 hodinami
I was just stuck with this topic not even my college professors were explaining it nicely then I found you. You owe a salute sir 🙇🙇
@nicezombie8054 Před 13 hodinami
The fourth level is explaining it to someone else, as that's always one of the hardest things to do and the best indication you truly know the concept.
@anaghpandey8805 Před 13 hodinami
You're a GOD
@virus404ripoff Před 13 hodinami
eeeeeeeee..
@Kate-R20 Před 13 hodinami
Puts it on its side 😊
@seth5119 Před 13 hodinami
So pi just pops up in the most unlikely of places......sensational
@andreizelenco4164 Před 13 hodinami
Because of parallel processing and GPUs you can convolve say a 1k image with a 3*3 kernel by offsetting the image one pixel to NW and multiply it by the first element in the kernel, then offset to N and multiply the image by the second element in the kernel an so on in all 8 directions + the center. Then you just add all the 9 images. That is also very fast and it works because of parallel processing.

3Blue1Brown

Komentáře