@Scale
@Scale
  • 592
  • 1 322 867
AI Training Orchestration Evolution with Serverless Building Blocks
AI Training Orchestration Evolution with Serverless Building Blocks | Maneet Bansal, Upasana Dixit, Shawn Wang
Join us as we talk about the evolution of workflow orchestration leading to the creation of composable serverless subsystems.We further discuss how Fblearner, an AI development platform, leveraged this building blocks ecosystem to address persistent challenges like orchestration-execution coupling, inefficient resource use, and poor debugging experiences. We will also delve into the complexities of updating a business-critical system with strict SLA guarantees at Meta scale.
zhlédnutí: 471

Video

Scalable Solutions for Running Large Language Models | Jiaxin Cao
zhlédnutí 383Před měsícem
Scalable Solutions for Running Large Language Models | Jiaxin Cao The advent of open-source large language models like Llama and Mixtral demands innovative deployment strategies for efficiency and cost-effectiveness. We will explore adaptive workload management for infrastructure optimization, crucial for handling varying demands efficiently. Next, we will delve into LLM caching techniques, inc...
Training Arctic at Snowflake | Jeff Rasley & Hyungtae Kim
zhlédnutí 545Před měsícem
Training Arctic at Snowflake | Jeff Rasley & Hyungtae Kim In this case study, we present the system used to train the Arctic MoE model at Snowflake. The system uses a combination of Snowflake and Kubernetes for the entire lifecycle of Large Language Model (LLM) training, ranging from the initial stages of data acquisition and processing-including annotation, filtering, and deduplication-to cond...
Training LlaMa - A Storage Perspective | Sumit Gupta & Robin Battey
zhlédnutí 1,4KPřed měsícem
Training LlaMa - A Storage Perspective | Sumit Gupta & Robin Battey GenAI training needs flipped the script of all of our assumptions around "storage at scale". This is the story of our trials and tribulations that ultimately led to the successful launch of our largest scale LLaMA training jobs, from a Storage perspective.
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs | Haibin Lin
zhlédnutí 887Před měsícem
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs | Haibin Lin
Maintaining Large Scale AI Capacity @ Meta | Benjamin Leonhardi & Saranyan A Vigraham
zhlédnutí 798Před měsícem
Maintaining Large Scale AI Capacity @ Meta | Benjamin Leonhardi & Saranyan A Vigraham In just two years, Meta has undergone a monumental transformation in its AI infrastructure, transitioning from a single research cluster to a sprawling network of nearly hundred AI super clusters of varying sizes with hundreds of thousands of GPUs. This rapid expansion has introduced a myriad of challenges, ra...
Keynote | Surupa Biswas
zhlédnutí 822Před měsícem
Keynote | Surupa Biswas
GenAI Training In Production: Software, Hardware & Network Considerations
zhlédnutí 1,5KPřed měsícem
GenAI Training In Production: Software, Hardware & Network Considerations
Evolving Cluster Management | Shankar Selvam & Cedric Goh
zhlédnutí 431Před měsícem
Evolving Cluster Management | Shankar Selvam & Cedric Goh
Building at Scale with H100: Eos as a DGX SuperPOD Reference Model for Large Data Center Builds
zhlédnutí 842Před měsícem
Building at Scale with H100: Eos as a DGX SuperPOD Reference Model for Large Data Center Builds
Data @Scale Live Q&A Session #1 | Moderator Manju Anand
zhlédnutí 96Před 2 měsíci
Data @Scale Live Q&A Session #1 | Moderator Manju Anand
Data @Scale Live Q&A Session #2 | Moderator Manju Anand
zhlédnutí 57Před 2 měsíci
Data @Scale Live Q&A Session #2 | Moderator Manju Anand
Taking Flight with Interactive Analytics | Frances Perry
zhlédnutí 331Před 2 měsíci
Taking Flight with Interactive Analytics | Frances Perry
Scaling Meta’s Infra with GenAI: Journey to faster and smarter Incident Response
zhlédnutí 1,1KPřed 2 měsíci
Scaling Meta’s Infra with GenAI: Journey to faster and smarter Incident Response
Large-Scale Data Graph: Scale and Optimize Privacy & Security in Offline Data Systems
zhlédnutí 350Před 2 měsíci
Large-Scale Data Graph: Scale and Optimize Privacy & Security in Offline Data Systems
The AI-First Data Infrastructure | Barak Yagour
zhlédnutí 755Před 2 měsíci
The AI-First Data Infrastructure | Barak Yagour
Case Study in Bridging Production Software and Data Practices for LLM Model Training Using Snowflake
zhlédnutí 319Před 2 měsíci
Case Study in Bridging Production Software and Data Practices for LLM Model Training Using Snowflake
Composable Data Management Systems | Pedro Pedreira & Amit Purohit
zhlédnutí 244Před 2 měsíci
Composable Data Management Systems | Pedro Pedreira & Amit Purohit
Smarter, faster Data analytics with generative AI and Machine Learning | Santosh Chandrachood
zhlédnutí 213Před 2 měsíci
Smarter, faster Data analytics with generative AI and Machine Learning | Santosh Chandrachood
Opening Remarks | Maor Kleider, Jelena Pjesivac-Grbovic
zhlédnutí 166Před 2 měsíci
Opening Remarks | Maor Kleider, Jelena Pjesivac-Grbovic
Navigating Data's Next Great Shift | DJ Patil & Daniel Francisco
zhlédnutí 298Před 2 měsíci
Navigating Data's Next Great Shift | DJ Patil & Daniel Francisco
Closing Remarks | Maor Kleider, Jelena Pjesivac-Grbovic
zhlédnutí 24Před 2 měsíci
Closing Remarks | Maor Kleider, Jelena Pjesivac-Grbovic
Beam Up Your GenAI Usage: Usability, Efficiency, Reliability with Apache Beam | Ahmet Altay
zhlédnutí 320Před 2 měsíci
Beam Up Your GenAI Usage: Usability, Efficiency, Reliability with Apache Beam | Ahmet Altay
Demystifying the Data Stack of the Largest and Fastest Growing Payment Gateway in India: Razorpay
zhlédnutí 481Před 2 měsíci
Demystifying the Data Stack of the Largest and Fastest Growing Payment Gateway in India: Razorpay
Fireside Chat: Evolution of AI-first Data Infrastructure | Moderator Manju Anand
zhlédnutí 733Před 2 měsíci
Fireside Chat: Evolution of AI-first Data Infrastructure | Moderator Manju Anand
Systems @Scale June 12, 2024
zhlédnutí 104Před 2 měsíci
Systems @Scale June 12, 2024
Data @Scale May 22, 2024
zhlédnutí 83Před 2 měsíci
Data @Scale May 22, 2024
Live Q&A 2 | Moderated by Jie Dong
zhlédnutí 100Před 4 měsíci
Live Q&A 2 | Moderated by Jie Dong
Live Q&A 1 | Moderated by Kaustubh Kalgaonkar
zhlédnutí 75Před 4 měsíci
Live Q&A 1 | Moderated by Kaustubh Kalgaonkar
Live Q&A 3 | Moderated by Pavel Punsky
zhlédnutí 70Před 4 měsíci
Live Q&A 3 | Moderated by Pavel Punsky