Martin Kleppmann - Event Sourcing and Stream Processing at Scale

Martin Kleppmann | Kafka Summit SF 2018 Keynote (Is Kafka a Database?)

"Turning the database inside out with Apache Samza" by Martin Kleppmann

Lady Plays Hide and Seek with Her Dog

Beautiful gymnastics 😍☺️

Attack a Terezka jdou na rande… KOUPALIŠTĚ

Building real-time data products at LinkedIn with Apache Samza

Martin Kleppmann

zhlédnutí 22 256

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 17. 11. 2014
Presented at Strata+Hadoop World, New York, 16 October 2014 strataconf.com/stratany2014/pu...
Slides: speakerdeck.com/ept/building-...
Abstract:
The world is going real-time. MapReduce, SQL-on-Hadoop and similar batch processing tools are fine for analyzing and processing data after the fact - but sometimes you need to process data continuously as it comes in, and react to it within a few seconds or less. How do you do that at Hadoop scale?
Apache Samza is an open source stream processing framework designed to solve these kinds of problems. It is built upon YARN/Hadoop 2.0 and Apache Kafka. You can think of Samza as a real-time, continuously running version of MapReduce.
Samza has some unique features that make it powerful. It provides high performance for stateful processing jobs, including aggregation and joins between many input streams. It is designed to support an ecosystem of many different jobs written by different teams, and it isolates them from each other, so that one badly behaved job can’t affect the others.
At LinkedIn, we have been using Samza in production for some time, both for internal analytics purposes and for data products that are served on the live site. In this talk, we’ll discuss our experience of working with Samza. You’ll learn about:
- What kinds of real-time data problems you can solve with Samza
- How Samza reliably scales to millions of messages per second
- How Samza compares to other stream processing frameworks
- How Samza can help collaboration between different data science, product, and engineering teams within an organization
- How to avoid implementing the same data pipeline twice (once for offline/batch processing and once for real-time/stream processing)
- Lessons we learnt on how to structure real-time data pipelines for scale and flexibility

Komentáře • 11

@MrMukulj Před 7 lety
Great talk Martin. Very well done!
@SamBessalah Před 9 lety
Great talk Martin.
@m1169199 Před 9 lety ⁺⁴
I like your slides, what did you use to make them?
@pollathajeeva23 Před rokem
TimeSeries tool may be more than
@houssemghazala Před 10 měsíci
👏👏👏🙏🙏🙏
@CoderCoronet Před rokem
Hello Martin!
Thank you very much for sharing such valuable content.
I’m trying to find your video about building robust data infrastructure with logs. The link to the video on the talk transcript is broken. Can you share a new link to that video?
Thank you!
@pinhusdash6895 Před 9 lety
What happens when a user from one partition views a user from another partition. How does the enrichment happen? Do you send a copy of the event to both partitions?
@pinhusdash6895 Před 9 lety
Pinhus Dash I think you kind of answer this at 46 minutes. But it does seem to double the time.
@dudeabideth4428 Před 4 lety
Isn't that a big database of profiles to have a copy? Or is it only the subset we care about? It sounded like the profiles replica got created from every profile edit event. So it sounds like a full replica
@tomhpolo Před 3 lety
I might be wrong, but in his talk it sounded like there are 2 levels of partioning: by user and by job.
By job: The stream processor for PageViewEventWithViewerProfile doesn't need all data from the EditUserProfile event, so it grab/replicate whatever fields it wants from that event.
By user: If you partition users into different processors (ie: profile['id'] modulo N), then each replica only has that % of users in it.
@GlebWritesCode Před 8 lety
I would say this talk has very little to do with Samza. Just a general view how LinkedIn does stream processing

Další v pořadí

Automatické přehrávání

Martin Kleppmann - Event Sourcing and Stream Processing at Scale

Martin Kleppmann — Event Sourcing and Stream Processing at Scale

Martin Kleppmann | Kafka Summit SF 2018 Keynote (Is Kafka a Database?)

Martin Kleppmann | Kafka Summit SF 2018 Keynote (Is Kafka a Database?)

"Turning the database inside out with Apache Samza" by Martin Kleppmann

"Turning the database inside out with Apache Samza" by Martin Kleppmann

Lady Plays Hide and Seek with Her Dog

Lady Plays Hide and Seek with Her Dog

Beautiful gymnastics 😍☺️

Beautiful gymnastics 😍☺️

Attack a Terezka jdou na rande… KOUPALIŠTĚ

Attack a Terezka jdou na rande… KOUPALIŠTĚ

What it feels like cleaning up after a toddler.

What it feels like cleaning up after a toddler.

"Transactions: myths, surprises and opportunities" by Martin Kleppmann

"Transactions: myths, surprises and opportunities" by Martin Kleppmann

"Apache Kafka and the Next 700 Stream Processing Systems" by Jay Kreps

"Apache Kafka and the Next 700 Stream Processing Systems" by Jay Kreps

ETL Is Dead, Long Live Streams: real-time streams w/ Apache Kafka

ETL Is Dead, Long Live Streams: real-time streams w/ Apache Kafka

Conflict Resolution for Eventual Consistency • Martin Kleppmann • GOTO 2016

Conflict Resolution for Eventual Consistency • Martin Kleppmann • GOTO 2016

CRDTs: The Hard Parts

CRDTs: The Hard Parts

Scaling Facebook Live Videos to a Billion Users

Scaling Facebook Live Videos to a Billion Users

Lamport on writing "Time, Clocks, and the Ordering of Events in a Distributed System"

Lamport on writing "Time, Clocks, and the Ordering of Events in a Distributed System"

Martin Kleppmann - Conflict Resolution for Eventual Consistency

Martin Kleppmann - Conflict Resolution for Eventual Consistency

SIUUUUU 😳 At 39 years old, Cristiano Ronaldo 🇵🇹 still makes football look easy 🔥

SIUUUUU 😳 At 39 years old, Cristiano Ronaldo 🇵🇹 still makes football look easy 🔥

ŠKODA FABIA HELLCAT 🔥🔥🔥 #ukazkaru #kk24 #realityshow

ŠKODA FABIA HELLCAT 🔥🔥🔥 #ukazkaru #kk24 #realityshow

Best Father #katebrush #shorts

Best Father #katebrush #shorts

irl stream in Czech Republic 🇨🇿

irl stream in Czech Republic 🇨🇿

Tento Fotbalista Vyhrál NEJVÍCE Trofejí ve FOTBALE!

Tento Fotbalista Vyhrál NEJVÍCE Trofejí ve FOTBALE!

Useful gadget for styling hair 🤩💖 #gadgets #hairstyle

Useful gadget for styling hair 🤩💖 #gadgets #hairstyle

Could Ancient Armor Stop Bullets? 🤔

Could Ancient Armor Stop Bullets? 🤔

This is not my neighbor Terrible neighbor! #funny #zoonomaly #memes

This is not my neighbor Terrible neighbor! #funny #zoonomaly #memes