Transformer models and BERT model: Overview

Sdílet
Vložit
  • čas přidán 4. 06. 2023
  • Watch this video to learn about the Transformer architecture and the Bidirectional Encoder Representations from Transformers (BERT) model. You learn about the main components of the Transformer architecture, and the different tasks that BERT can be used for, such as text classification, question answering, and natural language inference.
    Enroll on Google Cloud Skills Boost to view the lab walkthrough and participate in a hands-on lab!
    Enroll on Google Cloud Skills Boost → goo.gle/3Wk3jnC
    View the Generative AI Learning path playlist → goo.gle/LearnGenAI
    Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech
  • Věda a technologie

Komentáře • 27

  • @googlecloudtech
    @googlecloudtech  Před rokem +3

    Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech

  • @maleehahaider8596
    @maleehahaider8596 Před 2 dny

    The best explanation of this topic I have seen on the web till date….BRAVO

  • @ollieanntan4478
    @ollieanntan4478 Před rokem +16

    Important info. I'd love to see a similar video but with real life examples to illustrate the points.

  • @Ops_pops
    @Ops_pops Před rokem +2

    explained it very well and the use of relavent examples is awesome.thank you so much.your work is very much appreciated.

  • @venkateshpattu1620
    @venkateshpattu1620 Před 2 měsíci +1

    Fantastic video in a such a short time can explain this much details is incredible. Thank you

    • @googlecloudtech
      @googlecloudtech  Před 2 měsíci

      Thank you for the kind words! 🤗 We're glad you found this video helpful!

  • @notmimul
    @notmimul Před rokem +7

    8:15 What does it mean? 12 and 24 layers of transformers in BERT and then there is Transformer with 6.. 6 layer of encoder/decoder right? not same layers as BERT. layers in BERT are transformers but for transformer itself has 6 encoder/decoders.

    • @meltem9078
      @meltem9078 Před rokem

      Since they're referencing the original transformer architecture, the 6 layers refer to the encoder part of the transformer (encoder has six layers in the original paper: 2x self-attention, 2x feed-forward and 2x normalization layer after attention and feed-forward). BERT is an encoder-only transformer model

    • @vivekc2303
      @vivekc2303 Před 10 měsíci +3

      At 2:45 she says the OG research paper on transformer had 6 stacks of encoders stacked on top of each other... At 8:12 by saying `6 layers in the original transformer` I think she means 6 encoders on top of each other. They should not have used layers there because they already used that word differently to describe the two layers within each encoder (which are self-attention and feedforward layers).
      Also transformer doesn't denote a single encoder or a single layer of encoder-decoder pair. It rather represents the whole model with all its encoders and decoders.... So BERT is an advanced type of transformer model which has more encoder layers than the original transformer. So each layer within BERT can't be called transformer.
      [UPDATE] the layers within each encoder such as self-attention and feedforward are actually called sublayers... So it makes sense `layers` in a transformer refers to the stack of encoders and each layer within an encoder is called a `sublayer`.

  • @kartikpodugu
    @kartikpodugu Před 8 měsíci +8

    need more deep dive on how BERT works. Can you guide regarding more references. ?

  • @Philippe_Rougier
    @Philippe_Rougier Před 11 měsíci +3

    Is that a practical exercise in which we have to predict the missing word?:@2:43
    “ the encoding component is a stack of encoder of the same number” of what?I assume she meant “ of same structure ( entirely identical layers) “ ? It kind of confirmed later in the video…
    I there some reviewing these videos before they get out? the previous video on attention mechanism was absolutely confusing partially to the fact that the notations used were not matching the words of the presenter !

  • @sapnagupta6215
    @sapnagupta6215 Před 10 měsíci +6

    well explained, but I'd love to see a real life examples to illustrate the points.

  • @s0meb0dy78
    @s0meb0dy78 Před měsícem +1

    A Lifesaver...

  • @KiranMundy
    @KiranMundy Před 7 měsíci +1

    Since I don't have any background on transformers, I get completely lost at the point where you're explaining the query, key and value vectors and how the weights for these are determined at training time. I had to resort to questioning bard about this in more detail, but am still lost, although questioning bard helped in getting some understanding of what these 3 vectors are.
    Can you more clearly explain how the adjustment to the weights of the query, key and value matrices differs during backpropagation?

  • @wohola
    @wohola Před 9 měsíci +3

    I would love to see a real-time example to illustrate the step

  • @swarnodipnag
    @swarnodipnag Před 11 měsíci +1

    Well explained ❤

  • @kartikpodugu
    @kartikpodugu Před 8 měsíci +1

    How "Next Sentence Prediction" task is named as NPS, can you elaborate ? 9:29

  • @jeromeeusebius
    @jeromeeusebius Před 9 měsíci +1

    Thanks for the video explaining Transformer models and BERT. Good summary and highlevel description. Small nitpick: @9:03, abbreviation for "next sentence prediction" should be NSP, but slide has NPS.

  • @user-kd6wf6pf3g
    @user-kd6wf6pf3g Před 4 měsíci

    1:08 Isn't LSTM proposed at 1997?

  • @ferdousihaque9633
    @ferdousihaque9633 Před 11 měsíci +2

    joss

  • @nanelikahya9949
    @nanelikahya9949 Před 10 měsíci +2

    Confusing

  • @1857kyle
    @1857kyle Před 6 dny

    6:44

  • @OkiemTaty
    @OkiemTaty Před 7 měsíci +2

    Why are all videos explaining transformers so frocking boring and uneducative!!!

  • @keenoain6885
    @keenoain6885 Před 7 měsíci +2

    Is it a AI generative model made this video?
    Probably, all the information are correct, but:
    1. It is very much NOT clear how they connect to each other
    2. There is great emphasis on non relevant highly technical flows while keeping out description of what is the idea behind this structure, what is the motivation and advantages of this structure. Explain HOW it solves the problems.
    This video is providing quite a lot of useless information.
    For those who are familiar with the subject, it is way to basic; and to those that are not, it ‘gives’ nothing. No understanding what so ever.
    If you at Google use AI to generate your videos, at least review them before publishing.
    Moreover, her voice is like a voice of a typing machine.

    • @Linguisticsfreak
      @Linguisticsfreak Před 3 měsíci

      and the stress in some words is so off like "percentage", "component", "develop", and some others are said with the wrong stress.

  • @realGynaExpress
    @realGynaExpress Před 3 měsíci +1

    It's such a bad explanation... No examples and just reading the script. I totally get lost listening to it