Gail Weiss: Thinking Like Transformers

Sdílet
Vložit
  • čas přidán 24. 02. 2022
  • Paper presented by Gail Weiss to the Neural Sequence Model Theory discord on the 24th of February 2022.
    Gail's references:
    On Transformers and their components:
    - Thinking Like Transformers (Weiss et al, 2021) arxiv.org/abs/2106.06981 (REPL here: github.com/tech-srl/RASP)
    - Attention is All You Need (Vaswani et al, 2017) arxiv.org/abs/1706.03762
    - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al, 2018) arxiv.org/abs/1810.04805
    - Improving Language Understanding by Generative Pre-Training (Radford et al, 2018) s3-us-west-2.amazonaws.com/op...
    - Are Transformers universal approximators of sequence-to-sequence functions? (Yun et al, 2019) arxiv.org/abs/1912.10077
    - Theoretical Limitations of Self-Attention in Neural Sequence Models (Hahn, 2019) arxiv.org/abs/1906.06755
    - On the Ability and Limitations of Transformers to Recognize Formal Languages (Bhattamishra et al, 2020) arxiv.org/abs/2009.11264
    - Attention is Turing-Complete (Perez et al, 2021) jmlr.org/papers/v22/20-302.html
    - Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers (Wei et al, 2021) arxiv.org/abs/2107.13163
    - Multilayer feedforward networks are universal approximators (Hornik et al, 1989) www.cs.cmu.edu/~epxing/Class/...
    - Deep Residual Learning for Image Recognition (He at al, 2016) www.cv-foundation.org/openacc...
    - Universal Transformers (Dehghani et al, 2018) arxiv.org/abs/1807.03819
    - Improving Transformer Models by Reordering their Sublayers (Press et al, 2019) arxiv.org/abs/1911.03864
    On RNNs:
    - Explaining Black Boxes on Sequential Data using Weighted Automata (Ayache et al, 2018) arxiv.org/abs/1810.05741
    - Extraction of rules from discrete-time recurrent neural networks (Omlin and Giles, 1996) www.semanticscholar.org/paper...
    - Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples (Weiss et al, 2017) arxiv.org/abs/1711.09576
    - Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning (Rabusseau et al, 2018) arxiv.org/abs/1807.01406
    - On the Practical Computational Power of Finite Precision RNNs for Language Recognition (Weiss et al, 2018) aclanthology.org/P18-2117/
    - Sequential Neural Networks as Automata (Merrill, 2019) aclanthology.org/W19-3901.pdf
    - A Formal Hierarchy of RNN Architectures (Merrill et al, 2020) aclanthology.org/2020.acl-mai...
    - Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets (Joulin and Mikolov, 2015) proceedings.neurips.cc/paper/...
    - Learning to Transduce with Unbounded Memory (Grefenstette et al, 2015) proceedings.neurips.cc/paper/...
    Paper mentioned in discussion at the end:
    - Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth (Dong et al, 2021) icml.cc/virtual/2021/oral/9822
  • Věda a technologie

Komentáře • 8

  • @swim3936
    @swim3936 Před rokem +2

    fantastic presentation!

  • @GodofStories
    @GodofStories Před rokem +2

    This is great

  • @alexanderkyte4675
    @alexanderkyte4675 Před rokem +7

    Could I please have the slides? They’re partially obscured by the listeners here. I’d like to use them for a reading group.

    • @formallanguagesandneuralne5578
      @formallanguagesandneuralne5578  Před rokem +3

      hey, not managing to respond from my own account so positing from here - the slides are on my website, which is hosted on github - gailweiss dot github dot io

  • @stevenshaw124
    @stevenshaw124 Před rokem +2

    this was an excellent presentation! thank you!

  • @homeboundrecords6955
    @homeboundrecords6955 Před rokem +1

    I'll bet this reply will not be read, but... isn't the "subject" = "I" and the "object" = "dog" ?

    • @LGcommaI
      @LGcommaI Před rokem +1

      Yes, that's correct. The terminology is confusing though (IF one knows Latin): the 'subject' literally is 'that which is (thrown) UNDER' while the 'object' is 'that which is (thrown) on top' . Everyday sensibilities thus would expect that the object is the one who does sth. and the subject the one which has sth. done TO it. The standard convention is the OPPOSITE however.

    • @RaviAnnaswamy
      @RaviAnnaswamy Před rokem +1

      @@LGcommaI object generally refers to inert things and the 'subject' is used as English word for persons (King asked his subjects to pay more tax during the drought years...). This could be the reason for English grammar using subject for the actor and object for the acted upon (victim).