Scalable Parallel Computing Lab, SPCL @ ETH Zurich

208
127 240

14:33

HOT - Higher-Order Dynamic Graph Representation Learning with Efficient Transformers

20:16

LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems

14:02

Compressing Multidimensional Weather and Climate Data Into Neural Networks

15:50

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

25:06

Motif Prediction with Graph Neural Networks

22:41

Swing: Short-cutting Rings for Higher Bandwidth Allreduce

Paper Title: Swing: Short-cutting Rings for Higher Bandwidth Allreduce
Conference: NSDI 2024
Speaker: Daniele De Sensi
Authors: Daniele De Sensi, Tommaso Bonato, David Saam, Torsten Hoefler
Abstract:
The allreduce collective operation accounts for a significant fraction of the runtime of workloads running on distributed systems. One factor determining its performance is the distance between communicating nodes, especially on networks like torus, where a higher distance implies multiple messages being forwarded on the same link, thus reducing the allreduce bandwidth. Torus networks are widely used on systems optimized for machine learning workloads (e.g., Google TPUs and Amazon Trainium devices), as well as on some of the Top500 supercomputers. To improve allreduce performance on torus networks we introduce Swing, a new algorithm that keeps a low distance between communicating nodes by swinging between torus directions. Our analysis and experimental evaluation show that Swing outperforms by up to 3x existing allreduce algorithms for vectors ranging from 32B to 128MiB, on different types of torus and torus-like topologies, regardless of their shape and size.

zhlédnutí: 79

Video

14:33

Neural Graph Databases

zhlédnutí 90Před 28 dny

Paper Title: Neural Graph Databases Conference: First Learning on Graphs Conference (LoG'22) Speaker: Maciej Besta Authors: Maciej Besta, Patrick Iff, Florian Scheidl, Kazuki Osawa, Nikoli Dryden, Michal Podstawski, Tiancheng Chen, Torsten Hoefler Abstract: Graph databases (GDBs) enable processing and analysis of unstructured, complex, rich, and usually vast graph datasets. Despite the large si...

HOT - Higher-Order Dynamic Graph Representation Learning with Efficient Transformers

20:16

HOT - Higher-Order Dynamic Graph Representation Learning with Efficient Transformers

zhlédnutí 85Před měsícem

Paper Title: HOT - Higher-Order Dynamic Graph Representation Learning with Efficient Transformers Conference: Second Learning on Graphs Conference (LoG'23) Speaker: Maciej Besta Authors: Maciej Besta, Afonso Claudino Catarino, Lukas Gianinazzi, Nils Blach, Piotr Nyczyk, Hubert Niewiadomski, Torsten Hoefler Abstract: Many graph representation learning (GRL) problems are dynamic, with millions of...

LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems

14:02

LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems

zhlédnutí 67Před měsícem

Paper Title: LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems through Polling-Free and Retry-Free Operation Conference: Design, Automation and Test in Europe Conference (DATE 2024) Speaker: Samuel Riedel Authors: Samuel Riedel, Marc Gantenbein, Alessandro Ottaviano, Torsten Hoefler, Luca Benini Abstract: Extensive polling in shared-memory manycore systems can lead t...

Compressing Multidimensional Weather and Climate Data Into Neural Networks

15:50

Compressing Multidimensional Weather and Climate Data Into Neural Networks

zhlédnutí 102Před 2 měsíci

Title: Compressing multidimensional weather and climate data into neural networks Speaker: Langwen Huang Author: Langwen Huang, Torsten Hoefler Abstract: Weather and climate simulations produce petabytes of high-resolution data that are later analyzed by researchers in order to understand climate change or severe weather. We propose a new method of compressing this multidimensional weather and ...

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

25:06

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

zhlédnutí 534Před 2 měsíci

Paper Title: VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores Venue: International Conference for High Performance Computing, Networking, Storage, and Analysis (#SC23) Speaker: Roberto L. Castro Authors: Roberto L. Castro, Andrei Ivanov, Diego Andrade, Tal Ben-Nun, Basilio B. Fraguela, Torsten Hoefler Abstract: The increasing success and scaling of Deep Learning mo...

Motif Prediction with Graph Neural Networks

22:41

Motif Prediction with Graph Neural Networks

zhlédnutí 250Před 2 měsíci

Paper Title: Motif Prediction with Graph Neural Networks Conference: 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'22) Speaker: Maciej Besta Authors: Maciej Besta, Raphael Grob, Cesare Miglioli, Nicola Bernold, Grzegorz Kwaśniewski, Gabriel Gjini, Raghavendra Kanakagiri, Saleh Ashkboos, Lukas Gianinazzi, Nikoli Dryden, Torsten Hoefler Abstract: Link prediction is one of...

Demystifying Chains, Trees, and Graphs of Thoughts

6:18

Demystifying Chains, Trees, and Graphs of Thoughts

zhlédnutí 188Před 3 měsíci

Paper Title: Demystifying Chains, Trees, and Graphs of Thoughts Speaker: Maciej Besta Authors: Maciej Besta, Florim Memedi, Zhenyu Zhang, Robert Gerstenberger, Guangyuan Piao, Nils Blach, Piotr Nyczyk, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Lukas Gianinazzi, Ales Kubicek, Hubert Niewiadomski, Aidan O'Mahony, Onur Mutlu, Torsten Hoefler Abstract: The field of natural language process...

[SPCL_Bcast] The digital revolution of Earth system modelling

1:12:22

[SPCL_Bcast] The digital revolution of Earth system modelling

zhlédnutí 132Před 3 měsíci

Speaker: Peter Dueben Venue: SPCL_Bcast #47, recorded on 4th April, 2024 Abstract: This talk outlines three revolutions that happened in Earth system modelling in the past decades. The quiet revolution has leveraged better observations and more compute power to allow for constant improvements of prediction quality of the last decades, the digital revolution has enabled us to perform km-scale si...

[SPCL_Bcast] Capturing Computation with Algorithmic Alignment

1:01:01

[SPCL_Bcast] Capturing Computation with Algorithmic Alignment

zhlédnutí 154Před 3 měsíci

Speaker: Petar Veličković Venue: SPCL_Bcast #46, recorded on 21st March, 2024 Abstract: What makes a neural network better, or worse, at fitting certain tasks? This question is arguably at the heart of neural network architecture design, and it is remarkably hard to answer rigorously. Over the past few years, there have been a plethora of attempts, using various facets of advanced mathematics, ...

Co-design Hardware and Algorithm for Vector Search

24:58

Co-design Hardware and Algorithm for Vector Search

zhlédnutí 230Před 3 měsíci

Paper Title: Co-design Hardware and Algorithm for Vector Search Venue: SC'23, Denver CO Speaker: Wenqi Jiang Authors: Wenqi Jiang, Shigang Li, Yu Zhu, Johannes de Fine Licht, Zhenhao He, Runbin Shi, Cedric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, Gustavo Alonso Abstract: Vector search has emerged as the foundation for large-scale information retrieval and machine learning sy...

7:10

Demystifying Graph Databases

zhlédnutí 115Před 3 měsíci

Paper Title: Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries Journal: ACM Computing Surveys Speaker: Maciej Besta Authors: Maciej Besta, Robert Gerstenberger, Emanuel Peter, Marc Fischer, Michał Podstawski, Claude Barthels, Gustavo Alonso, Torsten Hoefler Abstract: Graph processing has become an important part of multiple areas of comp...

21:28

Fortran is dead - Long live Fortran!

zhlédnutí 1,1KPřed 4 měsíci

Torsten Hoefler's random access spontaneous talk given at the 42nd anniversary Salishan Conference on High-Speed Computing in 2023. Discusses how to lift Fortran code to a data-centric representation to optimize it for accelerator devices. Work led by Alexandru Calotoiu in SPCL.

Hot Interconnects - EtherNET: the present and future of datacenter and supercomputers

25:03

Hot Interconnects - EtherNET: the present and future of datacenter and supercomputers

zhlédnutí 303Před 5 měsíci

Panel "Ethernet or Ethernot" presentation at Hot Interconnects 2023.

[SPCL_Bcast] Can I Cook a 5 o'clock Compiler Cake and Eat It at 2?

1:02:20

[SPCL_Bcast] Can I Cook a 5 o'clock Compiler Cake and Eat It at 2?

zhlédnutí 213Před 6 měsíci

Speaker: Albert Cohen Venue: SPCL_Bcast #45, recorded on 7th December, 2023 Abstract: In high-performance computing words: can we build a compiler that will eventually save a lot of performance engineering effort while immediately delivering competitive results? Here, competitiveness refers to achieving near hardware peak-performance for important applications. The question is particularly hot ...

44:49

AI-Driven Performance Metaprogramming

zhlédnutí 495Před 6 měsíci

AI-Driven Performance Metaprogramming

HammingMesh: A Network Topology for Large-Scale Deep Learning

33:59

HammingMesh: A Network Topology for Large-Scale Deep Learning

zhlédnutí 550Před 6 měsíci

HammingMesh: A Network Topology for Large-Scale Deep Learning

GDI: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores

24:42

GDI: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores

zhlédnutí 115Před 8 měsíci

GDI: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores

[SPCL_Bcast] Scalable Graph Machine Learning

59:51

[SPCL_Bcast] Scalable Graph Machine Learning

zhlédnutí 163Před 8 měsíci

[SPCL_Bcast] Scalable Graph Machine Learning

[SPCL_Bcast] Heterogeneous multi-core systems for efficient EdgeML

1:12:46

[SPCL_Bcast] Heterogeneous multi-core systems for efficient EdgeML

zhlédnutí 295Před 8 měsíci

[SPCL_Bcast] Heterogeneous multi-core systems for efficient EdgeML

[SPCL_Bcast] Evaluating Large-Scale Learning Systems

58:43

[SPCL_Bcast] Evaluating Large-Scale Learning Systems

zhlédnutí 242Před 9 měsíci

[SPCL_Bcast] Evaluating Large-Scale Learning Systems

ML for High-Performance Climate: Data Post Processing, Compression, and Earth Virtualization Engines

1:08:09

ML for High-Performance Climate: Data Post Processing, Compression, and Earth Virtualization Engines

zhlédnutí 508Před 9 měsíci

ML for High-Performance Climate: Data Post Processing, Compression, and Earth Virtualization Engines

HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement

14:05

HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement

zhlédnutí 432Před rokem

HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement

How to Adjust Network-on-Chip Topologies to Design Goals and Architectures

13:13

How to Adjust Network-on-Chip Topologies to Design Goals and Architectures

zhlédnutí 826Před rokem

How to Adjust Network-on-Chip Topologies to Design Goals and Architectures

Noise in the Clouds: Influence of Network Performance Variability on Application Scalability

22:28

Noise in the Clouds: Influence of Network Performance Variability on Application Scalability

zhlédnutí 214Před rokem

Noise in the Clouds: Influence of Network Performance Variability on Application Scalability

Scheduling Task Graphs on Dataflow Architectures

29:29

Scheduling Task Graphs on Dataflow Architectures

zhlédnutí 336Před rokem

Scheduling Task Graphs on Dataflow Architectures

Bjorn Stevens on Earth Virtualization Engines (EVE)

22:32

Bjorn Stevens on Earth Virtualization Engines (EVE)

zhlédnutí 1,1KPřed rokem

Bjorn Stevens on Earth Virtualization Engines (EVE)

"From Two Strong Oxen to Billions of Fleas." Torsten Hoefler's Sidney Fernbach Award Lecture at SC22

32:47

"From Two Strong Oxen to Billions of Fleas." Torsten Hoefler's Sidney Fernbach Award Lecture at SC22

zhlédnutí 356Před rokem

"From Two Strong Oxen to Billions of Fleas." Torsten Hoefler's Sidney Fernbach Award Lecture at SC22

[Bcast] HPVM: Performance, Programmability and Retargetability for Heterogeneous Parallel Systems

1:04:59

[Bcast] HPVM: Performance, Programmability and Retargetability for Heterogeneous Parallel Systems

zhlédnutí 970Před rokem

[Bcast] HPVM: Performance, Programmability and Retargetability for Heterogeneous Parallel Systems

Rusty Lusk’s legacy: The role of MPI in modern AI

39:44

Rusty Lusk’s legacy: The role of MPI in modern AI

zhlédnutí 405Před rokem

Rusty Lusk’s legacy: The role of MPI in modern AI

Komentáře

@abhinavghosh725 Před měsícem
is this planned to be released in a general purpose release/integration with current kafka versions. Is this usable for production use-cases or is it still under some testing?
@maryamsamami6974 Před 2 měsíci
Dear Mr. Hoefler! Thanks for offering the useful video. May I ask if you could please share the slide of the video with me?
@ChrisPollitt Před 2 měsíci
TIOBE Index for May 2024: Fortran in the top 10
@Machineman2500 Před 2 měsíci
Fortran is still used today, particularly in scientific, engineering, and high-performance computing applications where numerical computation and performance are critical. While newer languages like Python and Julia have gained popularity for general-purpose programming and rapid prototyping, Fortran remains widely used in fields such as computational physics, climate modeling, computational chemistry, and finite element analysis
@FindecanorNotGmail Před 2 měsíci
Correction: 33.45. When he says, "4 ecks speed up", he means "four _times_ "
@simonpeter9617 Před 3 měsíci
good work
@user-ub7bi4sz8q Před 4 měsíci
i really doubt you can run yolo on versal :P
@kamertonaudiophileplayer847 Před 4 měsíci
My friend claimed that he can program in Fortran everything. How it is true! I also converted many original weather model calculations from Algol to Fortran. They work great.
@superkaran20 Před 4 měsíci
Thank you so much, it was very helpful.
@vinayakkesharwani7769 Před 5 měsíci
great explaination, this video saved my hours of spending in understanding it from docs.
@backToFreedom Před 5 měsíci
the sound is terrible! Fix it if you want to listened
@HarishNarayanan Před 5 měsíci
Thank you very much for this talk, and especially for providing context for where it sits in the field.
@bhamadicharef Před 8 měsíci
Excellent presentation ... the AI Engine (AIE) looks great !
@mar-xpro Před 9 měsíci
Nice talk! Super interesting to see this DL-HPC double perspective. Btw the email address of the website mentioned at 57:55 for hiring seems unreacheable. The emails sent are bounced back after 2-3 days without being delivered.
@sanaulislam2354 Před 11 měsíci
Which software did u used for designing
@spcl Před 11 měsíci
All results are obtained using our custom NoC cost and performance prediction toolchain (see spclgitlab.ethz.ch/iffp1/sparse_hamming_graph_public ) - Does this answer your question?
@infinite-saaswath Před rokem
Great stuff!
@Reskareth Před rokem
But what happens when one node has multiple connected nodes which have a lower ID. Then one node would need to point to multiple other nodes. What am I missing?
@serpantleo8490 Před rokem
Very good paper, love from 🇨🇳
@kipropcollins4220 Před rokem
wasn't there an interesting way to deliver this? i mean, seriously?
@bobl557 Před rokem
Most of the paths in a large system are 4 hops long, not three. In the 545 group example he uses, there is only one link between each global group. So, the first two hops get you to the correct global bus. The next two hops get you to the terminal switch. The largest system that would have a three hop maximum is 9,216 comprising 18 groups.
@danieledesensi5532 Před rokem
Hops are counted as switch-to-switch hops. Switches within a group are fully connected, thus you need in the worst case one hop to reach another switch in the source group, one hop to reach the destination group, and one hop in the destination group to reach the destination switch.
@congchuatocmay4837 Před rokem
Well, yeh, you are really not interested DDL.
@darrenjefferson6492 Před rokem
Nice one pal 😀!! Get rapid results > 'promosm' .
@howwway4999 Před rokem
That's really cool, hope I can get an opportunity for the possible PhD postion in your lab😃
@mprone Před rokem
Is there any open PhD position at ETH on these topics ?
@spcl Před rokem
Yes, see spcl.inf.ethz.ch/Jobs/
@elliot2456 Před rokem
@@spcl is that a PhD position or a job for phd students ? why does it say "Contracts will be 12-month renewable, with an initial probatory period" ? I thought PhD were supposed to last at least 3 years.
@Qmpi Před rokem
and where am I?
@wassimmuna Před rokem
Society 5.0 ... Is that where we finally get a for-loop to iterate through the entire population to serve every inhabitant's needs and desires, instead of passing policies in a top-down approach and wondering why there are still dissatisfied people left behind somehow... or is this going to be another instance of promising technology that only entrenches preexisting distributions of security, opportunities, comforts and luxuries. And for the record, karma is illegal vigilantism. Most promising technology starts with idealistic intentions and ends up being misused to dish out varying degrees of harm. Pardon me if I don't understand why my for-loop hasn't already been implemented on a 486. Maybe that'll be Society 6.0. But obviously, great work by the researchers. Let's just hope the decision-makers live up to the same standard of effort and quality of intent.
@shikharjain3536 Před rokem
What is the difference between a program dependence graph[by Ferrante & ottenstein] and contextual flow graph?
@prithvivelicheti287 Před 2 lety
Insightful
@zeyuli3258 Před 2 lety
Could you please upload your source code again?It seems to be 404 now:(
@spcl Před 2 lety
The code has been released few minutes after your message. Please, check again. Thanks!
@vedanshverma6854 Před 2 lety
The best tutorial to get idea of how cool actual programming is for hpc using hls fpga
@kowsalyas5259 Před 2 lety
Y f
@mohammadjalali4183 Před 2 lety
she is amazing
@wolfgangmitterbaur3942 Před 2 lety
Good day Mr. Hoefler, a very good and extensive overview of this huge topic. Thanks a lot.
@qwmp Před 2 lety
This is just truly a great gem!
@sanjeewaweerage9407 Před 2 lety
can I have this ppt?
@paulthompson9668 Před 2 lety
This is very informative content, but you need to slow down because you end up mispronouncing words at times.
@oscarsandoval9870 Před 2 lety
Excellent review of the state of the art, well explained and concise, thank you Torsten!
@hitmanonstadia1784 Před 2 lety
Nice slides! However the speaker speaks too fast like a rapper, leaves me with painful headaches after the talk. : ((((
@byliu5200 Před 2 lety
Very helpful! Thank you!
@alexxx4434 Před 2 lety
Very nice presentation
@SandipJadhavcctech Před 2 lety
Thanks a ton. Very helpful 👌
@ayushchaturvedi5203 Před 2 lety
where can i find the slides of this presentation
@hossamfadeel Před 3 lety
Thanks for your efforts.
@zachariasfisches7018 Před 3 lety
Great presentation!
@shihlien Před 3 lety
GPT-2 model memory will saturate one of the WSC SRAM, right?
@spcl Před 3 lety
At 1:50 Prof. Hoefler says we will not use linearizability in the lecture. To clarify: We do not use linearizability in this lecture, but we will introduce linearizability in a later lecture.
@hoaxuan7074 Před 3 lety
You can put a random projection before a sparse neural network. This shares out all the information everywhere in the input evenly. Then each sparse dot product gets a fair sub-sample of the input vector. A more structured sub-random projection could be better.
@hoaxuan7074 Před 3 lety
For anything sparse I think (h)ashtable. Not very GPU friendly. However I think there is trend toward federated learning using large numbers of cheap CPU boards. For fast nets training becomes limited by dram memory bandwidth. The sum dram memory bandwidth of a number of cheap boards can be very high and obviously not too expensive.
@hoaxuan7074 Před 3 lety
You know adding a bunch of numbers is a dot product. The bunch of numbers in vector form dot <1,1,...,1> Likewise adding and subtracting a bunch of numbers is a dot product. Eg. The bunch of numbers in vector form dot <+1,-1,-1,+1,....> Certain patterns of addition and subtraction lead to the fast (Walsh) Hadamard transform. Which is a collection of fixed dot products where the cost per dot product is log2(n) add subtract operation. Eg. for a dot product of 65536 terms the cost is 16 add subtract operations🍸
@hoaxuan7074 Před 3 lety
Numenta have a sparse neural network that uses top-k selection instead of ReLU. They have a whitepaper. And they have a forum. --- discourse numenta---

Scalable Parallel Computing Lab, SPCL @ ETH Zurich

Komentáře