Understanding Graph Attention Networks

Sdílet
Vložit
  • čas přidán 5. 09. 2024

Komentáře • 177

  • @NimaDmc
    @NimaDmc Před 2 lety +41

    I can admit that this is the best explanation for GAT and GNN one can find. Fantastic explanation with very simple English. The quality of sound and video is great as well. Many thanks.

    • @DeepFindr
      @DeepFindr  Před 2 lety +1

      Thank you for your kind words

  • @kenbobcorn
    @kenbobcorn Před 3 lety +26

    This was simply a fantastic explanation video, I really do hope this video gets more coverage than it already has. It would be fantastic if you were to explain the concept of multi-head attention in another video. You've earned yourself a subscriber +1.

    • @DeepFindr
      @DeepFindr  Před 3 lety +1

      Thank you, I appreciate the feedback!
      Sure, I note it down :)

  • @user-of2hd3bq4n
    @user-of2hd3bq4n Před 16 dny

    This might be the best and simple explanation of GAT one can ever find! Thanks man

  • @anastassiya8526
    @anastassiya8526 Před dnem

    it was the best explanation that gave me hope for the understanding these mechanisms. Everything was so good explained and depicted, thank you!

  • @xorenpetrosyan2879
    @xorenpetrosyan2879 Před 2 lety +3

    This is the best and most in detail explanation on Graph CNN attention I've found. Great job!

  • @snsacharya1737
    @snsacharya1737 Před 29 dny

    A wonderful and succinct explanation with crisp visualisations about both the attention mechanism and the graph neural network. The way the learnable parameters are highlighted along with the intuition (such as a weighted adjacency matrix) and the corresponding matrix operations is very well done.

  • @jianxianghuang1275
    @jianxianghuang1275 Před 3 lety +5

    I especially love your background pics.

  • @anupr567
    @anupr567 Před 2 lety +2

    Explained in terms of basic Neural Network terminologies!! Great work 👍

  •  Před 8 měsíci

    Your work has been an absolute game-changer for me! The way you break down complex concepts into understandable and actionable insights is truly commendable. Your dedication to providing in-depth tutorials and explanations has tremendously helped me grasp the intricacies of GNNs. Keep up the phenomenal work!

  • @user-nt1zq5so5g
    @user-nt1zq5so5g Před 5 měsíci +1

    amazing!!! author well done!!!

  • @NadaaTaiyab
    @NadaaTaiyab Před 2 lety +1

    I'd love it if you could explain multi-head attention as well. You really have such a good grasp of this very complex subject.

    • @DeepFindr
      @DeepFindr  Před 2 lety

      Hi! Thanks!
      Multi-head attention simply means that several attention mechanisms are applied at the same time. It's like cloning the regular attention.
      What exactly is unclear here? :)

    • @NadaaTaiyab
      @NadaaTaiyab Před 2 lety

      @@DeepFindr The math and code are hard to fully grasp. If you could break down the linear algebra with the matrix diagrams as you have done for single head attention, I think people would find that very helpful.

  • @pu239
    @pu239 Před 3 lety +3

    This is pretty amazing content. The way you explain the concept is pretty great and I especially like the visual style and very neat looking visuals and animations you make. Thank you!

    • @DeepFindr
      @DeepFindr  Před 3 lety +1

      Thank you for your kind words :)

  • @samuel2318
    @samuel2318 Před 2 lety +1

    Clear explanation and visualization on attention mechanism. Really helpful in studying GNN.

  • @tobigm1917
    @tobigm1917 Před 6 měsíci

    Thank you very much! This was my introduction into GAT and helped me to immediately get a good grasp of the basic concept :) I like the graphical support you provide to the explanation, it's gerat!

  • @toluolu9390
    @toluolu9390 Před 2 lety +1

    Very well explained. Thank you very much!

  • @chrispapadakis3965
    @chrispapadakis3965 Před 3 lety +2

    Just for anyone confused, in accordance to the illustration in the summary the weight matrix should have 5 rows instead of 4 that are shown in the video.
    Great video and I admire the fact that your topics of choice are really into the latest hot staff of ML!

  • @nurkleblurker2482
    @nurkleblurker2482 Před 2 lety +2

    Extremely helpful. Very well explained in concrete and abstract terms.

  • @user-jx9fy4ml9k
    @user-jx9fy4ml9k Před 3 lety +6

    Amazingly easy to understand. Thank you.

  • @celestchowdhury2605
    @celestchowdhury2605 Před rokem

    very good explanation! clear and crisp, even I, a beginner, feeling satisfied after watching this. Should get more recognition!

  • @mohammadrzakarimi2140
    @mohammadrzakarimi2140 Před 2 lety +1

    Your visual explanation is super great, help many people to learn some-hour stuff in minutes!
    Please make more videos on specialized topics of GNNs!
    Thanks in advance!

    • @DeepFindr
      @DeepFindr  Před 2 lety

      I will soon upload more GNN content :)

  • @adityashahane1429
    @adityashahane1429 Před 2 lety +3

    very well explained, provides a very intuitive picture of the concept. Thanks a ton for this awesome lecture series!

  • @benjamintan3069
    @benjamintan3069 Před rokem

    I need more Graph Neural Network related video!!

    • @DeepFindr
      @DeepFindr  Před rokem

      There will be some more in the future. Anything in particular you are interested in? :)

  • @hlew2694
    @hlew2694 Před 8 měsíci

    This is the MOST BEST video of GCN and GAT, very great, thank you!

  • @Eisneim1
    @Eisneim1 Před 9 měsíci

    very helpful tutorial, clearly explained!

  • @kevon217
    @kevon217 Před 10 měsíci

    Great walkthrough.

  • @raziehrezaei3156
    @raziehrezaei3156 Před 2 lety +1

    such an easy-to-grasp explanation! such a visually nice video! amazing job!

  • @NadaaTaiyab
    @NadaaTaiyab Před 2 lety +1

    Great! Thank you for explaining the math and the linear algebra with the simple tables.

  • @sapirharary8262
    @sapirharary8262 Před 3 lety +2

    Great video! your explanation was amazing. Thank you!!

  • @AkhmadMizkat
    @AkhmadMizkat Před rokem

    This is a very great explanation covering basic GNN and the GAT. Thank you so much

  • @amansah6615
    @amansah6615 Před 2 lety

    easy and best explanation
    nice work

  • @Ssc2969
    @Ssc2969 Před 11 měsíci

    Fantastic explaination.

  • @eelsayed9380
    @eelsayed9380 Před 2 lety +1

    Great explination, really appretiated.
    If you Please could u make a videa explain the loss calculation and backpropagation in gnn?

  • @omarsoud2015
    @omarsoud2015 Před rokem

    Thanks for the best explanation.

  • @Moreahead1
    @Moreahead1 Před rokem

    clearly clear explanation, super best video lecture about GNN ever seen.

  • @mydigitalwayia956
    @mydigitalwayia956 Před 2 lety

    Muchas gracias por el video. Despues de haber visto muchos otros, puedo decir que el suyo es el mejor, el mas sencillo de entender. Estoy muy agradecido con usted. Saludos

  • @huaiyuzheng5577
    @huaiyuzheng5577 Před 3 lety +2

    Very nice video. Thanks for your work~

  • @SylwiaNano
    @SylwiaNano Před rokem

    Thx for the awesome explanation!
    A video with attention in CNN e.g. UNet would be great :)

    • @DeepFindr
      @DeepFindr  Před rokem

      I slightly capture that in my video on diffusion models. I've noted it down for the future though.

  • @sadhananarayanan1031
    @sadhananarayanan1031 Před rokem

    Thank you so much for this beautiful video. Have been trying out too many videos on GNN and GAN but this video definitely tops. I finally understood the concept behind it. Keep up the good work :)

  • @kodjigarpp
    @kodjigarpp Před 3 lety

    Thank you for sharing this clear and well-designed explanation.

  • @wenqichen4151
    @wenqichen4151 Před 3 lety

    I really salute you for this detailed video! that's very intriguing and clear! thank you again!

  • @mahmoudebrahimkhani1384
    @mahmoudebrahimkhani1384 Před 9 měsíci

    simple and informative! Thank you!

  • @marcusbluestone2822
    @marcusbluestone2822 Před rokem

    Very clear and helpful. Thank you so much!

  • @geletamekonnen2323
    @geletamekonnen2323 Před 2 lety

    Thank you bro. Confused head now gets the idea about GNN.

  • @AbleLearners
    @AbleLearners Před 8 měsíci

    A Great explanation

  • @hainingliu3471
    @hainingliu3471 Před rokem

    Very clear explanation. Thank you!

  • @sharadkakran531
    @sharadkakran531 Před 3 lety +4

    Hi, Can you tell which tool you're using to make those amazing visualizations? All of your videos on GNNs are great btw :)

    • @DeepFindr
      @DeepFindr  Před 3 lety +1

      Thanks a lot! Haha I use active presenter (it's free for the basic version) but I guess there are better alternatives out there. Still experimenting :)

  • @user-ux2gz7sm6z
    @user-ux2gz7sm6z Před rokem

    best video for learning GNN thank you so much!

  • @nazarzaki44
    @nazarzaki44 Před rokem

    Great video! Thank you

  • @mamore.
    @mamore. Před 3 lety

    most understandable explanation so far!

  • @user-mq8gv4pv3e
    @user-mq8gv4pv3e Před 2 lety +1

    Good explanation to the key idea. One question, what is the difference between GAT and self attention constrained by a adjacency matrix(eg. Softmax(Attn*Adj) )? The memory used for GAT is D*N^2, which is D times of the intermediate ouput of SA. The node number of graph used in GAT thus cannot be too large because of memory size. But it seems that they both implement dynamic weighting of neighborhood information constrained by a adjacency matrix.

    • @DeepFindr
      @DeepFindr  Před 2 lety

      Hi,
      Did you have a look at the implementation iny PyG? pytorch-geometric.readthedocs.io/en/latest/_modules/torch_geometric/nn/conv/gat_conv.html#GATConv
      One of the key tricks in GNNs is usually to represent the adjacency matrix in COO format. Therefore you have adjacency lists and not a nxn matrix.
      Using functions like gather or index_select you can then do a masked selection of the local nodes.
      Hope this helps :)

  • @hyeongseonpark7018
    @hyeongseonpark7018 Před 3 lety

    Very Helpful Explanation! Thank you!

  • @sukantabasu
    @sukantabasu Před 6 měsíci

    Simply exceptional!

  • @zheed4555
    @zheed4555 Před rokem

    This is very helpful!

  • @philipkamau6288
    @philipkamau6288 Před 3 lety

    Thanks for sharing the knowledge!

  • @salahaldeen1751
    @salahaldeen1751 Před rokem

    Wonderful explination! thanks

  • @scaredheart6109
    @scaredheart6109 Před 15 dny

    AMAZING!

  • @mbzf2773
    @mbzf2773 Před 2 lety

    Thank you so much for this great video.

  • @Kevoshea
    @Kevoshea Před 4 měsíci

    great video, thanks

  • @leo.y.comprendo
    @leo.y.comprendo Před 2 lety

    I learned so much from this video! Thanks a lot

  • @Bwaaz
    @Bwaaz Před 6 měsíci

    Great quality thank you !

  • @maudentable
    @maudentable Před 2 lety

    Awesome.....

  • @sajjadayobi688
    @sajjadayobi688 Před 2 lety

    A great explanation, many thanks

  • @farzinhaddadpour7192
    @farzinhaddadpour7192 Před rokem

    Very nice, thanks for effort!

  • @dominikklepl7991
    @dominikklepl7991 Před 2 lety +3

    Thank you for the great video. I have one question, what happens if weighted graphs are used with attention GNN? Do you think adding the attention-learned edge "weights" will improve the model compared to just having the input edge weights (e.g. training a GCNN with weighted graphs)?

    • @DeepFindr
      @DeepFindr  Před 2 lety +2

      Hi! Yes I think so. The fact that the attention weights are learnable makes them more powerful than just static weights.
      The model might still want to put more attention on a node, because there is valuable information in the node features, independent of the weight.
      A real world example of this might be the data traffic between two network nodes. If less data is sent between two nodes, you probably assign a smaller weight to the edge. Still it could be that the information coming from one nodes is very important and therefore the model pays more attention to it.

  • @GaoyuanFanboy123
    @GaoyuanFanboy123 Před 11 měsíci

    please use brackets and multiplication signs between matrices so i can map the mathematical formula to the visualization

  • @daesoolee1083
    @daesoolee1083 Před 2 lety

    well explained.

  • @arnaiztech
    @arnaiztech Před 2 lety

    Outstanding explanation

  • @dariomendoza6079
    @dariomendoza6079 Před 2 lety

    Excellent explanation 👌 👏🏾

  • @Jorvanius
    @Jorvanius Před 2 lety

    Excellent job, mate 👍👍

  • @anvuong1099
    @anvuong1099 Před 2 lety

    Thank you for wonderful content

  • @khoaphamang3413
    @khoaphamang3413 Před 2 lety

    Supper explaination

  • @sangramkapre
    @sangramkapre Před 2 lety +2

    Awesome video! Quick question: do you have a video explaining Cluster-GCN? And if yes, do you know if similar clustering idea can be applied to other networks (like GAT) to be able to train the model on large graphs? Thanks!

  • @imalive404
    @imalive404 Před 3 lety

    Great Explanation! As you pointed out this is one way of attention mechanism. Can you also provide references to other attention mechanisms.

    • @DeepFindr
      @DeepFindr  Před 3 lety

      Hi! The video in the description from this other channel explains the general attention mechanism used in transformers quite well :) or do you look for other attention mechanisms in GNNs?

    • @imalive404
      @imalive404 Před 3 lety

      @@DeepFindr yes thanks for sharing that too in the video. I was curious about the attention mechanisms on gnn

    • @DeepFindr
      @DeepFindr  Před 3 lety +1

      OK :)
      In my next video (of the current GNN series) I will also Quickly talk about Graph Transformers. There the attention coefficients are calculated with a dot product of keys and queries.
      I hope to upload this video this or next week :)

  • @abhishekomi1573
    @abhishekomi1573 Před 2 lety

    I am following your playlist on GNN and this is the best content I get as of now.
    I have a CSV file and want to apply GNN on it but I don't understand how to find the edge features from the CSV file

    • @DeepFindr
      @DeepFindr  Před 2 lety +2

      Thanks! Did you see my latest 2 videos? They show how to convert a CSV file to a graph dataset. Maybe it helps you to get started :)

    • @abhishekomi1573
      @abhishekomi1573 Před 2 lety

      @@DeepFindr thanks, hope i will get my answer :-)

  • @user-yl9bd7nn2h
    @user-yl9bd7nn2h Před 11 měsíci

    Thanks for the great explanation! Just one thing that I do not really understand, may I ask how do you get the size of the learnable weight matrix [4,8]? I understood that there are 4 rows due to the number of features for each node. However, not sure where the 8 columns come from.

    • @mistaroblivion
      @mistaroblivion Před 10 měsíci

      I think 8 is the arbitrarily chosen dimensionality of the embedding space.

  • @dharmendraprajapat4910

    4:00 do you multiply "feature node matrix" with "adjacency matrix" before multiplying it with "learnable weight matrix" ?

  • @snp27182
    @snp27182 Před 2 lety

    Good video, but you should have mentioned how in NLP, a sequence of words is used to build a fully connected adjacency graph. This is why attention can can be used in graph data; because even in NLP, it's already ON graph data!

  • @muhammadwaqas-gs1sp
    @muhammadwaqas-gs1sp Před 3 lety

    Brilliant video 👍👍👍

  • @user-ow5sk4fo2e
    @user-ow5sk4fo2e Před 2 lety

    Very understandable! Thank you.
    Can you share your presentation?

    • @DeepFindr
      @DeepFindr  Před 2 lety

      Sure! Can you send me an email to deepfindr@gmail.com and I'll attach it :) thx

    • @keteverma3441
      @keteverma3441 Před 2 lety +1

      @@DeepFindr Hey I have also sent you an email, could you please attach the presentation?

  • @yusufani8
    @yusufani8 Před 2 lety

    Amazing thank you 🤩

  • @AndreaStevensKarnyoto
    @AndreaStevensKarnyoto Před 3 lety

    very helpful video, but I still confuse in some part. Maybe I should watch this for few times. thanks

    • @DeepFindr
      @DeepFindr  Před 3 lety

      Hi! What is unclear to you?
      :)

  • @aditijuneja1848
    @aditijuneja1848 Před rokem

    hi.. Your explanations are really nice and easy to understand and seem rooted in fundamentals. Thank you for that. I am new to reading research papers, and i find it difficult to understand them sometimes and end up wasting a lot of time on not-so-important things. But this is what I think my problem is, but it can be something else too...idk... like sometimes i don't have the pre req or have gap in my knowledge... Could you please make a video about it or help in the comments, or recommend some other resource to get better at reading papers and understanding from the bottom up? thank you very much 🙏🙏

  • @alexvass
    @alexvass Před rokem

    Thanks

  • @KingMath22232
    @KingMath22232 Před 3 lety

    THANK YOU!

  • @james.oswald
    @james.oswald Před 3 lety

    Great Video!

  • @cw9249
    @cw9249 Před rokem

    thank you. what if you also wanted to have edge features?

    • @DeepFindr
      @DeepFindr  Před rokem

      Hi, I have a video on how to use edge features in GNNs :)

  • @MariaPirozhkova
    @MariaPirozhkova Před rokem

    Hi! Are what you explain in the "Basics" and the message-passing concept the same things?

    • @DeepFindr
      @DeepFindr  Před rokem

      Yes, they are the same thing :) passing messages is in the end nothing else but multiplying with the adjacency matrix. It's just a common term to better illustrate how the information is shared :)

  • @pi5549
    @pi5549 Před 9 měsíci

    2:55 Looks like it should be sum(H * W) not sum(W * H). 5x4 * 4x8 works.Suggest you provide errata at the top of the description. Someone else has noticed an error later in the video.

  • @zacklee5787
    @zacklee5787 Před 2 měsíci

    I have come to understand attention as key, query, value multiplication/addition. Do you know why this wasn't used and if it's appropriate to call it attention?

    • @DeepFindr
      @DeepFindr  Před 2 měsíci

      Hi,
      Query / Key / Value are just a design choice of the transformer model. Attention is another technique of the architecture.
      There is also a GNN Transformer (look for Graphormer) that follows the query/key/value pattern. The attention mechanism is detached from this concept and is simply a way to learn importance between embeddings.

  • @barondra38
    @barondra38 Před 3 lety

    Love your work and thick accent, thank you! These attention coefficients look very similar to weighted edges for me, so I want to ask a question: If my graph is unweighted attributed graph, would GATConv produce different output compared with GCNConv by Kipf and Welling?

    • @DeepFindr
      @DeepFindr  Před 3 lety +1

      hahah, thanks!
      I'm not sure if I understood the question correctly. If you have an unweighted graph, GAT will anyways learn the attention coefficients (which can be seen as edge weights) based on the embeddings. It can be seen as "learnable" edge weights.
      So I'm pretty sure that GATConv and GCNConv will produce different outputs.
      From my experience, using the attention mechanism, the output embeddings are better than using plain GCN.

  • @lightkira8281
    @lightkira8281 Před 2 lety

    شكرا لك

  • @sqliu9489
    @sqliu9489 Před 2 lety

    Thanks for the video! There's a question: at 13:03, I think the 'adjacency matrix' consists of {e_ij} could be symmetric, but after the softmax operation, the 'adjacency matrix' consists of {α_ij} should not be symmetric any more. Is that right?

    • @DeepFindr
      @DeepFindr  Před 2 lety

      Yes usually the attention weights do not have to be symmetric. Is that what you mean? :)

    • @sqliu9489
      @sqliu9489 Před 2 lety

      @@DeepFindr Yes. Thanks for your reply!

  • @ayushsaha5539
    @ayushsaha5539 Před rokem

    Why does the new state calculated have more features than the original state? I dont understand

    • @DeepFindr
      @DeepFindr  Před rokem

      It's because the output dimension (neurons) of the neural network is different then the input dimension.
      You could also have less or the same number of features.

  • @n.a.7271
    @n.a.7271 Před 2 lety

    how is learnable weight matrix is formed ? have some material to understand it better?

    • @DeepFindr
      @DeepFindr  Před 2 lety

      This simply comes from dense (fully connected layers). There are lots of resources, for example here: analyticsindiamag.com/a-complete-understanding-of-dense-layers-in-neural-networks/#:~:text=The%20dense%20layer's%20neuron%20in,vector%20of%20the%20dense%20layer.

  • @roufaidalaidi8597
    @roufaidalaidi8597 Před 2 lety

    Thanks a lot. Your videos are really helpful. I have a few questions regarding the case of weighted graphs. Would attention still be useful if the edges are weighted? If so, how to pass edge wights to the attention network? Can you suggest a paper doing that?

    • @DeepFindr
      @DeepFindr  Před 2 lety +1

      The GAT layer of PyG supports edge features but no edge weights. Therefore I would simply treat the weights as one dimensional edge features.
      The attention then additionally considered these weights.
      Probably the learned attention weights and the edge weights are sort of correlated, but I think it won't harm to include them for the attention calculation. Maybe the attention mechanism can learn even better scores for the aggregation :) I would just give it a try and see what happens. For example compare RGCN + edge weights with GAT + edge features.

    • @roufaidalaidi8597
      @roufaidalaidi8597 Před 2 lety

      @@DeepFindr thanks a lot for the reply.

  • @user-sc3dg6yw6v
    @user-sc3dg6yw6v Před 2 lety

    Very helpful video! Thank you for your great work! Two questions, 1. Could you please explain the Laplacian Matrix in GCN, the GNN explained in this video is spatial-based, and I hope I can get a better understanding of those spectral-based ones. 2. How to draw those beautiful pictures? Could you share the source files? Thanks again!

    • @DeepFindr
      @DeepFindr  Před 2 lety +1

      Hi!
      The Laplacian is simply the degree matrix of a graph subtracted by the adjacency matrix. Is there anything in particular you are interested in? :)
      My presentations are typically a mix of PowerPoint and active presenter, so I can send you the slides. For that please send an email to deepfindr@gmail.com :)

  • @PostmetaArchitect
    @PostmetaArchitect Před 4 dny

    Ist almost as if its just a normal neural network but projected onto a graph

  • @dmitrivillevald9274
    @dmitrivillevald9274 Před 3 lety

    Thank you for the great video! I wanted to ask - how is training of this network performed when the instances (input graphs) have varying number of nodes and/or adjacency matrix? It seems that W would not depend on the number of nodes (as its shape is 4 node features x 8 node embeddings) but shape of attention weight matrix Wa would (as its shape is proportional to the number of edges connecting node 1 with its neighbors.)

    • @DeepFindr
      @DeepFindr  Před 3 lety +2

      Hi! The attention weight matrix has always the same shape. The input shape is twice the node embedding size because it always takes two neighbor - combinations and predicts the attention coefficient for them. Of course if you have more connected nodes, you will have more of these combinations, but you can think of it like the batch dimension increases, but not the input dimension.
      For instance you have node embeddings of size 3. Then the input for the fully connected network is for instance [0.5, 1, 1, 0.6, 2, 1], so the concatenated node embeddings of two neighbors (size=3+3). It doesn't matter how many of these you input into the attention weight matrix.
      If you have 3 neighbors for a node it would look like this:
      [0.5, 1, 1, 0.6, 2, 1]
      [0.5, 1, 1, 0.7, 3, 2]
      [0.5, 1, 1, 0.8, 4, 3]
      The output are then 3 attention coefficients for each of the neighbors.
      Hope this makes sense :)

    •  Před 3 lety

      @@DeepFindr If graph sizes are already different, I mean if one have graph_1 that has 2200 nodes(that results in 2200,2200 adj. matrix, and graph_2 has 3000 nodes (3000,3000 adj matrix), you can zero pad graph_1 to 3000. This way you'll have fixed size of input for graph_1 and graph_2. Zero padding will create dummy nodes with no connection. So the sum with the neighboring nodes will be 0. And having dummy features for dummy nodes, you'll end up with fixed size graphs.

    • @DeepFindr
      @DeepFindr  Před 3 lety

      Hi, yes that's true! But for the attention mechanism used here no fixed graph size is required. It also works for a different number of nodes.
      But yes padding is a good idea to get the same shapes :)

  • @etiennetiennetienne
    @etiennetiennetienne Před rokem

    why replacing dot product attn with concat proj + leaky relu?

    • @DeepFindr
      @DeepFindr  Před rokem

      That's a good point. I think the TransformerConv is the layer that uses dot product attention. I'm also not aware of any reason why it was implemented like that. Maybe it's because this considers the direction of information (so source and target nodes) better. Dot product is cummutative, so i*j is the same as j*i, so it can't distinguish between the direction of information flow. Just an idea :)

  • @nastaranmarzban1419
    @nastaranmarzban1419 Před 2 lety

    Hi, sorry to bother you
    I have a question
    What's the difference between soft-attention and self-attention?

    • @DeepFindr
      @DeepFindr  Před 2 lety

      Hi! There is soft vs hard attention, you can search for it on Google.
      For self attention there are great tutorials, such as this one peltarion.com/blog/data-science/self-attention-video