Gail Weiss: Thinking like Transformers
Vložit
- čas přidán 3. 07. 2024
- Abstract: Transformers - the purely attention based Neural Network
architecture - have emerged as a powerful tool in sequence
processing. But how does a transformer think? When we discuss the
computational power of RNNs, or consider a problem that they have
solved, it is easy for us to think in terms of automata and their
variants (such as counter machines and pushdown automata). But when it
comes to transformers, no such intuitive model is available.
In this talk I will present a programming language, RASP (Restricted
Access Sequence Processing), which we hope will serve the same purpose
for transformers as finite state machines do for RNNs. In particular,
we will identify the base computations of a transformer and abstract
them into a small number of primitives, which are composed into a
small programming language. We will go through some example programs
in the language, and discuss how a given RASP program relates to the
transformer architecture. Finally, we will see that thinking in terms
of RASP helps us find a concrete difference between the expressive
power of efficient and 'vanilla' transformers! - Věda a technologie
wow, really cool way to think about the operations.
Great idea and excellent presentation. It would be a good idea to see if rasp programs can be learned using symbolic regression from toy problem datasets that transformers solve.
Explain rasp programs please?