Gail Weiss: Thinking like Transformers

Sdílet
Vložit
  • čas přidán 3. 07. 2024
  • Abstract: Transformers - the purely attention based Neural Network
    architecture - have emerged as a powerful tool in sequence
    processing. But how does a transformer think? When we discuss the
    computational power of RNNs, or consider a problem that they have
    solved, it is easy for us to think in terms of automata and their
    variants (such as counter machines and pushdown automata). But when it
    comes to transformers, no such intuitive model is available.
    In this talk I will present a programming language, RASP (Restricted
    Access Sequence Processing), which we hope will serve the same purpose
    for transformers as finite state machines do for RNNs. In particular,
    we will identify the base computations of a transformer and abstract
    them into a small number of primitives, which are composed into a
    small programming language. We will go through some example programs
    in the language, and discuss how a given RASP program relates to the
    transformer architecture. Finally, we will see that thinking in terms
    of RASP helps us find a concrete difference between the expressive
    power of efficient and 'vanilla' transformers!
  • Věda a technologie

Komentáře • 3

  • @kevon217
    @kevon217 Před rokem +1

    wow, really cool way to think about the operations.

  • @RaviAnnaswamy
    @RaviAnnaswamy Před rokem +2

    Great idea and excellent presentation. It would be a good idea to see if rasp programs can be learned using symbolic regression from toy problem datasets that transformers solve.