Transformers, explained: Understand the model behind GPT, BERT, and T5

Google Cloud Tech

zhlédnutí 879 872

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 4. 06. 2024
Dale’s Blog → goo.gle/3xOeWoK
Classify text with BERT → goo.gle/3AUB431
Over the past five years, Transformers, a neural network architecture, have completely transformed state-of-the-art natural language processing. Want to translate text with machine learning? Curious how an ML model could write a poem or an op ed? Transformers can do it all. In this episode of Making with ML, Dale Markowitz explains what transformers are, how they work, and why they’re so impactful. Watch to learn how you can start using transformers in your app!
Chapters:
0:00 - Intro
0:51 - What are transformers?
3:18 - How do transformers work?
7:41 - How are transformers used?
8:35 - Getting started with transformers
Watch more episodes of Making with Machine Learning → goo.gle/2YysJRY
Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech
#MakingwithMachineLearning #MakingwithML
product: Cloud - General; fullname: Dale Markowitz; re_ty: Publish;
Věda a technologie

Komentáře • 359

@Omikoshi78 Před rokem ⁺⁷⁹
Ability to break down complex topic is such an underrated super power. Amazing job.
@robchr Před 2 lety ⁺²²⁸
Transformers! More than meets the eye.
@suomynona7261 Před rokem ⁺³
😂
@Marcoose81 Před rokem ⁺⁸
Transformers! Robots in disguise!
@DomIstKrieg Před rokem ⁺³
Autobots wage their battle to fight the evil forces of the Decepticons!!!!!
@mieguishen Před rokem ⁺¹
Transformers! No money to buy…
@05012215 Před rokem
Oczywiście
@rohanchess8332 Před 11 měsíci ⁺⁴⁷
How did you condense so many pieces of information in such a short time? This video is on a next level, I loved it!
@dj67084 Před rokem ⁺⁹
This is awesome. This has been one of the best overall breakdowns I've found. Thank you!!
@tongluo9860 Před rokem ⁺²²²
Great explanation of the key concept of position encoding and self attention. Amazing you get the gist covered in less than 10 minutes.
@patpearce8221 Před rokem ⁺¹
@Dino Sauro tell me more...
@patpearce8221 Před rokem
@Dino Sauro thanks for the heads up
@an-dr6eu Před rokem ⁺³
She has one of the wealthiest company on earth providing her resources. First hand access to engineers, researchers, top notch communicators and marketing employees.
@michaellavelle7354 Před rokem ⁺²
@@an-dr6eu True, but this young lady talks a mile-a-minute from memory. She's knows it cold regardless of the resources at Google.
@pankajchand6761 Před 9 dny
@@michaellavelle7354 Her explanation is absolutely useless. Have you ever programmed a Transformer model from scratch to verify what she has explained?
@dylan_curious Před rokem ⁺¹⁶
This is such an informative video about transformers in machine learning! It's amazing how a type of neural network architecture can do so much, from translating text to generating computer code. I appreciate the clear explanations of the challenges with using recurrent neural networks for language analysis, and how transformers have overcome these limitations through innovations like positional encodings and self-attention. It's also fascinating to hear about BERT, a popular transformer-based model that has become a versatile tool for natural language processing in many different applications. The tips on where to find pertrained transformer models and the popular transformers Python library are super helpful for anyone looking to start using transformers in their own app. Thanks for sharing this video!
@trushatalati5596 Před 2 lety ⁺⁷
This is a really awesome video! Thank you so much for simplyifying the concepts.
@rajqsl5525 Před 6 měsíci ⁺²
You have the gift of making things simple to understand. Keep up the good work 🙏
@luis96xd Před rokem ⁺⁶
Amazing video! Nice explanation and examples 😄👍
I would like to see more videos like this and practices ones
@PaperTools Před rokem ⁺²⁷
Dale you are so good at explaining this tech, thank you!
@erikengheim1106 Před 3 měsíci ⁺¹
Thanks you did a great job. I spent some time already looking at different videos to capture the high level idea of what transformers are about and yours is the clearest explanation. I actually do have an educational background in neutral networks but don't go around remembering every details or the state of the art today so somebody removing all the unessesary technical details like you did here is very useful.
@reddyvarinaresh7924 Před 2 lety ⁺⁵
I loved it and very simple ,clear explanation.
@shravanacharya4376 Před 2 lety ⁺²
So easy and clear to understand. Thanks
@labsanta Před rokem ⁺⁵⁰
Takeaways:
A transformer is a type of neural network architecture that is used in natural language processing. Unlike recurrent neural networks (RNNs), which analyze language by processing words one at a time in sequential order, transformers use a combination of positional encodings, attention, and self-attention to efficiently process and analyze large sequences of text.
Neural networks, Convolutional neural networks (for image analysis), Recurrent neural networks (RNNs), Positional encodings, Attention, Self-attention
Neural networks: A type of model used for analyzing complicated data, such as images, videos, audio, and text.
Convolutional neural networks: A type of neural network designed for image analysis.
Recurrent neural networks (RNNs): A type of neural network used for text analysis that processes words one at a time in sequential order.
Positional encodings: A method of storing information about word order in the data itself, rather than in the structure of the network.
Attention: A mechanism used in neural networks to selectively focus on parts of the input.
Self-attention: A type of attention mechanism that allows the network to focus on different parts of the input simultaneously.
Neural networks are like a computerized version of a human brain, that uses algorithms to analyze complex data.
Convolutional neural networks are used for tasks like identifying objects in photos, similar to how a human brain processes vision.
Recurrent neural networks are used for text analysis, and are like a machine trying to understand the meaning of a sentence in the same order as a human would.
Positional encodings are like adding a number to each word in a sentence to remember its order, like indexing a book.
Attention is like a spotlight that focuses on specific parts of the input, like a person paying attention to certain details in a conversation.
Self-attention is like being able to pay attention to multiple parts of the input at the same time, like listening to multiple conversations at once.
@an-dr6eu Před rokem
Great, you learned how to copy paste
@yumyum_99 Před rokem ⁺¹⁰
@@an-dr6eu first step on becoming a programmer
@JohnCorrUK Před rokem ⁺³
@@an-dr6eu your comment comes over somewhat 'catty' 😢
@Jewish5783 Před rokem ⁺¹
i really enjoyed the concepts you explained. simple to understand
@JayantKochhar Před rokem
Positional Encoding, Attention and Self Attention. That's it! Really well summarized.
@bondsmagi Před 2 lety ⁺⁶⁸
Love how you simplified it. Thank you
@luxraider5384 Před rokem
It s so simplified that you can t understand anything
@CarlosRodriguez-mv8qi Před rokem ⁺⁴
Charm, intelligence and clarity! Thanks!
@user-wr4yl7tx3w Před 2 lety ⁺⁴
Wow, this is so well explained.
@maayansharon280 Před rokem ⁺²³
This is a GREAT explanation! please lower the background music next time it could really help. thanks again! awesome video
@akashrawat217 Před rokem
Such a simple yet revolutionary 💡idea
@TallesAiran Před rokem ⁺⁶
I love how to simplify something so complex, thank you so much Dale, the explanation was perfect
@asstimus-prime Před rokem
how did you do that
@nahiyanalamgir7056 Před rokem
@@asstimus-prime This one? Just type ":" (colon) followed by "thanksdoc" and end it with another colon. I can add other emojis like 🤟too!
@asstimus-prime Před rokem
@@nahiyanalamgir7056 it needs desktop CZcams i think
@nahiyanalamgir7056 Před rokem
@@asstimus-prime Apparently, it does. When will these apps be consistent across devices and platforms?
@asstimus-prime Před rokem ⁺¹
@@nahiyanalamgir7056 thanks though
@bingochipspass08 Před 2 lety
Very well explained.. This really is a high level view of what Transformers are, but it's probably enough to just get your toes wet in the field!
@SeanTechStories Před rokem ⁺¹
That's a really good high-level explanation!
@touchwithbabu Před rokem
Fantastic!. Thanks for simplifying the concept
@ansumansamal3767 Před 2 lety ⁺²²²
Where is optimus prime?
@alwaysabiggafish3305 Před rokem ⁺¹⁴
He's on the thumbnail...
@ankitnmnaik229 Před rokem ⁺¹⁰
He will be in theaters in June 9... Transformers : Rise of breasts..
@captainbob6680 Před rokem ⁺¹
😂😂😂😂
@yomajo Před 11 měsíci
Where are robotaxis?
@yeoj_maximo1122 Před 11 měsíci
We got lied to
@walterppk1989 Před 2 lety ⁺²¹
Hi Google! First of all, thank you for this wonderful video. I'm working on a multiclass (single label) supervised learning that uses Bert for transfer learning. I've got about 10 classes and a couple hundred thousand examples. Any tips on best practices (which Bert variants to use, what order of magnitude of dropout to use if any)? I know I could do hyperparameter search but that'd probably cost more time and money than I'm comfortable with (for a prototype), so I'm looking to make the most out of my local Nvidia 3080.
@todayu Před rokem ⁺¹
This was a really, really awesome breakdown 👏🏾
@noureldinosamas2978 Před rokem ⁺¹⁶⁶
Amazing video! 🎉 You explained that difficult concepts of Transformers so clearly and made it easy to understand. Thanks for all your hard work!🙌👍
@pumbo_nv Před 10 měsíci ⁺⁴
Are you serious? The concepts were not really explained. Just a summary of what they do but not how they work behind the scenes.
@axscs1178 Před 5 měsíci
No.
@mfatal Před rokem ⁺⁵
Love the content and thanks for the great video! (one thing that might help is lower the background music a bit, I found myself stopping the video because I thought another app was playing music)
@Daniel-iy1ed Před rokem
Thank you so much. I really needed this video, other videos were just confusing
@softcoda Před 13 dny ⁺²
This has to be the best explanation so far, and by a very large margin.
@googlecloudtech Před 7 dny ⁺¹
Thank you for watching! We appreciate the kind words. 🤗
@barbara1943 Před 5 měsíci
Very interesting, informative, this added perspective to a hyped-up landscape. I'll admit, I'm new to this, but when I hear "pretrained transformer" I didn't even think about BERT. I appreciate getting the view from 10,000 feet.
@rembautimes8808 Před 3 měsíci
This is a very well produced video. Credits to the presenter and those involved in production with the graphics
@junepark1003 Před 5 měsíci
This is one of the best vids I've watched on this topic!
@MaxKar97 Před měsícem
Nice amount of info parted in this video. Very clear info on what Transformers are and what made them so great.
@danielchen2616 Před rokem
Thanks for your hard work.This video is very helpful!!!
@harshadfx Před 9 měsíci ⁺¹
I have more respect for Google after watching this Video. Not only did they provided their engineers with the funding to research, but they also let other companies like OpenAI to use said research. And they are opening up the knowledge for the general public with these video series.
@shailendraburman Před 2 lety ⁺¹
Simply loved it!
@hallucinogen22 Před 4 měsíci
thank you! I'm just starting to learn about gpt and this was quite helpful, though I will have to watch it again :)
@JohnCorrUK Před rokem ⁺¹
Excellent presentation and explanation of concepts
@DeanRGAnderson Před rokem ⁺¹
This is an excellent video introduction for transformers.
@sorbethyena3828 Před 2 lety ⁺²
Informative! Thank you
@Mariouigi Před rokem
crazy how things have changed so much
@GurpreetSingh-uu1xl Před 6 dny
Thanks Ma'am. You broke it down well.
@josedamiansanchez9874 Před rokem
Amazing explanation!
@bobdillan5761 Před rokem ⁺¹
super well done. Thanks for this!
@NicolasHart Před 4 měsíci
so super helpful for my thesis, thank u
@sun-ship Před 3 měsíci
Easiest to understand explaination ive heard so far
@xiongjiedai8405 Před rokem
Very good lecture, thanks!
@EranM Před rokem ⁺⁴
I knew little on transformers before this video. I know little on transformers after this video. But I guess in order to know some, we'll need a 2-3 hours video.
@myt97 Před rokem
Great video. Thank you!
@RobShuttleworth Před 2 lety ⁺⁹
The visuals are very helpful. Thanks.
@googlecloudtech Před 2 lety ⁺³
You're very welcome!
@jsu12326 Před 3 měsíci
wow, what a great summary! thanks!!!
@rodeoswing Před 7 měsíci ⁺¹
Great video for people who are curious but don’t really want to (or can’t) understand how transformers actually work.
@ganbade200 Před 2 lety ⁺⁶
You have no idea how much time I potentially have saved just by reading your blog and watching this video to get me up to speed quickly on this. "Liked" this video. Thanks
@theguythatcoment Před rokem ⁺²
do transformers learn the internal representation one language at a time or all of them at the same time? I remember that Chomsky said that there's no underlying structure to language and that for every rule you try to make you'll always find an edge case that contradicts the rule.
@mohankiranp Před 8 měsíci
Very well explained. This video is must watch for anyone who wants to demystify the latest LLM technology. Wondering if this could be made into a more generic video with a quick high-level intro on neural networks for those who aren't in the field. I bet there are millions out there who want to get a basic understanding of how ChatGPT/Bard/Claude work without an in-depth technical deep dive.
@janeerin6918 Před 7 měsíci ⁺¹
OMG the BEST transformers video EVER!
@VaibhavPatil-rx7pc Před rokem
Excellent explanation i ever seen, recommending everyone's this link
@ayo4757 Před rokem ⁺¹
Soo cool! Great work
@tusharjamwal Před 11 měsíci
How did you sync your talking cadence to the background music?
@anshulchaurasia8762 Před rokem
Simplest Explanation ever
@gammacubed Před 5 měsíci
Amazing video, thank you so much!
@ludologian Před rokem
When I was a kid, I knew the trouble of translation were due to literally translation words, without contextual/ sequential awareness. I knew it's important to distinguish between synonyms. I've imagined there's a button that generate the translation output then you can highlights the you words that doesn't make sense or want improvement on it . then regenerate text translation. this type of nlp probably exist before I program my first hello world (+15y ago)!
@ZeeshanAli-ck3ue Před rokem
very well explained.👍
@shivangsharma599 Před rokem
Super Explanation!!
@arpitrawat1203 Před 2 lety ⁺⁶
Very well explained. Thank you.
@Prog2012 Před 3 dny
It was funny and instructive. Thanks 🙂
@gerardovalencia805 Před 2 lety ⁺²
Thank you
@RonaldMorrissetteJr Před rokem ⁺¹
When I saw this title, I was hoping to better understand the mathematical workings of transformers such as matrices and the like. Maybe you could do a follow-up video explaining mathematically how transformers work.
thank you for your time
@maxkhan4485 Před rokem
Thanks! Great video.
@user-or7ji5hv8y Před 2 lety ⁺²
Great video.
@JG27Korny Před 6 měsíci
Very informative video. Thank you!
@AleksandarKamburov Před rokem
Positional encoding = time, attention = context, self attention = thumbprint (knowledge)... looks like a good start for AGI 😀
@massimobuonaiuto8753 Před rokem
great video, thanks!
@zacharythomas5046 Před rokem
Thanks! This is a great intro video!
@WalterReade Před 2 lety ⁺⁴
Nicely done. Very helpful. Thanks!
@probablygrady Před rokem
phenomenal video
@takeizy Před rokem
Very impressive video. Thanks for the way you shared information via this video.
Reference your video timeline 05:05, how you created such a video, please.
@robertabitbol6454 Před rokem ⁺¹
You have actually given the BEST explanation on Neural Machine Translation that I read so far but you are missing a few elements
@robertabitbol6454 Před rokem ⁺¹
But your explanations, your analyses and your delivery are excellent. You're definitely a great communicator and teacher.
@robertabitbol6454 Před rokem
Actually Google and others have an algo they're not interested in sharing and I pretty much know what it is. I am working with my programmer on the coding of my new app, the revolutionary Universal Sentence builder and the Universal Dictionary and I keep adding and changing stuff to simplify the concept and I push at a later date the programming of my Sentence Analyser app. It is like most of my apps a simple (and brilliant concept) coded with very few lines of code.
@robertabitbol6454 Před rokem
You know Alfred Hitchcock was always adapting into the screen his scenario never changing anything not even a comma while Francis Ford Copolla (The Godfather) was doing the opposite: They say that his script was like a newspaper that had new contents every day. Well I am more like Copolla with my apps. I change stuff all the time and I usually make my programmers go crazy. It's a good sign. :-) Mind you I don't know if one can do like Hitchcock with an app. Come up with a definite version once and for all. This would be quite an achievement!
@robertabitbol6454 Před rokem
In the case of my Universal Sentence builder, the main task was to process the data entered by the user and we've been at it since July 2022. :-) It's either I am dumb or it is a complex task. Actually it is the latter for I have started with French, this langage being the most complex in the world. The good news is I am sure I will be imitated but you can rest assured that my imitators will also have a jolly hard time with French :-)
@amimegh Před rokem
NICE SUPERB PRESENTATION
@KulbirAhluwalia Před rokem ⁺³
From 5:28, shouldn't it be the following:
"when the model outputs the word “économique,” it’s attending heavily to both the input words “European” and “Economic.” "?
For européenne, I see that it is attending only to European. Please let me know if I am missing something here. Thanks for the great video.
@badrinair Před rokem
Thank you for sharing
@hom01 Před rokem
this is brilliant
@gmarziou Před rokem ⁺⁵
Please remove background music, it's really disturbing when you only listen to this otherwise great video
@JorgetePanete Před 2 lety
Pretty nice, is there any automatic way of cleaning up data with errors such as a mislabel, or a grammar error?
@luxraider5384 Před rokem
Ask chatgpt
@TechNewsReviews Před 8 měsíci
woww, she's good at explaining things
@MichaelToop Před rokem
Great video. Thx.
@GubeTube19 Před rokem
10/10. Very helpful
@jasonlough6640 Před 2 měsíci
So, question: given the goal of understanding meaning within language regardless of language, could a sophisticated enough set of weights derived from a sufficiently large dataset represent essentially the human genome of language?
@Maisonier Před rokem
Amazing video, thank you ... can you use transformers to detect patterns in random data that which is supposedly unpredictable, like weather or stocks?
@Happypast Před rokem
the unpredictability of stuff like weather and stocks has to do with the fundamental underlying nature of those phenomena so I would bet no.
@Christakxst Před rokem
Thanks, that was very interesting
@JosephHenzi Před 2 lety ⁺²
I'll jump on where others are doing the same - would love advice for someone who understands half the concepts that are alluded to as complex naturally and the innovation feels obvious I'm unsure how to break into the space without some guidance or connection between having exactly that great natural grasp but wildly anxious that language and logic are strengths and math is a mental turn off. For someone needing that type of translation/guide where my approach is language usage & finer cues what is the key terms to get to that understanding? Hate being fascinated and all the tools to play in this space and being unable to start because how I approach topics so welcome any advice.
@meepk633 Před rokem
Just go to school.
@EduardoOviedoBlanco Před rokem
Great content 👍
@IceMetalPunk Před 2 lety ⁺¹⁶
The invention of transformers seems to have jump-started a revolutionary acceleration in machine learning! Between the models you mentioned here, plus the way transformers are combined with other network architectures in DALL-E 2, OpenAI Jukebox, PaLM, Chinchilla/Flamingo, Gato -- it seems like adding a transformer to any model produces bleeding-edge, state-of-the-art-or-better performance on basically any tasks.
Barring any major architecture innovations in the future, I wonder if transformers end up being the key we need to reach human levels of broad-range performance after all 🤔
@IceMetalPunk Před rokem ⁺²
@Dino Sauro They're certainly not dead, since they're still being incorporated into the bleeding edge AIs. But technology is always evolving, building upon one idea to create the next. If you're hoping for a "final architecture" that will be the best and never replaced by anything else, you're out of luck.
While I respect Professor Marcus, his ideas about the requirements for AGI strongly imply that intelligent design is required for true intelligence to emerge, and I think evolution contradicts that view.
@IceMetalPunk Před rokem ⁺¹
@Dino Sauro Um... Okay, friend, whatever you say. Have a nice life.
@tanweeralam1650 Před rokem
I think you are right...we just saw its use in ChatGPT...and I think ChatGPT is just a glimpse of what future holds and how it will affect the IT, EV and Industrial Automation Industry.
Am I right? You wanna add something to it?
@IceMetalPunk Před rokem ⁺¹
@@tanweeralam1650 I agree. ChatGPT, though, is really just GPT-3 with a larger input layer, and human-guided reinforcement learning on top of it. Which is a step in the right direction for sure, but not as huge a development as a lot of people are touting it to be.
From what I can tell, there are three issues that need to be solved before transformer-based (or transformer-incorporating) AIs can reach truly human levels of intelligent behavior.
(1) They need to be bigger. If we think of the model parameter size as analogous to brain synapses, there are about a quadrillion synapses in a human brain, which is orders of magnitude more than the biggest current transformers. For instance, the largest single transformer model is 207 billion parameters, and the largest transformer-incorporating language model is 1.75 trillion parameters. On the other hand, such models don't need to allocate parameters for things like body maintenance, reproduction, etc., so it's not a 1-to-1 correspondence, but I think it's a good estimate for the order of magnitude we need to reach before we get to human levels of sapience. That said, models keep getting bigger, so I have no doubt we'll achieve this within the next decade at most.
(2) Multimodality is important. A lot of "common sense" understanding that AIs seem to lack can likely be attributed to their lack of variety in types of input they can learn from. If you only learn from text, it's a lot harder to learn what the described concepts actually *mean.* On the other hand, a model that can learn from text, images, video, audio, and other forms of data should be able to learn much more accurate representations of the world. And of course, there's a TON of research into multimodal learning right now, so we'll get there pretty soon, too, I think.
(3) The third obstacle I think is the hardest: continual learning. (From what I can tell, by the way, "continual learning" is synonymous with "incremental online learning". Let me know if there are any important differences between the two.) An AI without this can learn from a *ton* of data, but once it does, it stops learning and everything it knows is set in stone. In effect, this means every interaction with such an AI "resets" it, and so you might get inconsistent behaviors as slightly different initial conditions of an interaction can lead to very different outputs when previous similar interactions are not incorporated into the model's weights (which, in this context, can be thought of as its "long term memory"). This also means the AIs can't form consistent opinions, since any opinion they might espouse in one conversation is immediately forgotten for the next.
Continual learning techniques already exist for smaller networks, but they are not at all efficient enough to practically apply to these very large language models of many billions of parameters or more. Which is a shame, because I'd speculate that larger models would be less prone to retroactive interference -- "catastrophic forgetting" -- than smaller ones, if we could efficiently incrementally train them.
@tanweeralam1650 Před rokem
@@IceMetalPunk I did understand your first 2 points and agree with it...but I want to slightly differ with your 3rd point.
I dont understand...Why would the AI would stop learning?? Due to its storage space, Processing power exhaustion or for what reason? What you said may be a POSSIBILITY...But its others side also exists...it may just continue learning more n more and make it's system better.
To have Human like Intelligence...I dont think it will achieve that in next 30-40 yrs...far from those timeline...I can't say. And frankly there is NO NEED to have AIs so Advanced. Upto a certain extent...AIs should develop and Humans MUST BE able to control them. Always.
And can you say will Programs like ChatGPT ( i mean its advanced form) able to replace search Engine like Google in future?? Also how AI/ML will affect IT industry as a whole and also EV, Industrial Automation industry (e.g.- the industry where companies like Siemens, Honeywell operate)??
@AnujChourange Před rokem
can anyone help me understand bart model which is used in chat GPT3, and how to train it?
@aGj2fiebP3ekso7wQpnd1Lhd Před rokem
Fantastic video
@tuapuikia Před rokem ⁺¹
Thank you so much for your help. With the assistance of GPT-4, I have been able to transition from a seasonal programmer to a full-time programmer. I am truly grateful for your support!
@doodlve Před rokem
Nice to hear that
@softcoda Před rokem
Wowww….thanks for clarifying my confusion.

Další v pořadí

Automatické přehrávání

Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!