Scalable Stream Processing: A Survey of Storm, Samza, Spark and Flink by Felix Gessert

Sdílet
Vložit
  • čas přidán 16. 05. 2017
  • Batch-oriented systems have done the heavy lifting in data-intensive applications for decades, but they do not reflect the unbounded and continuous nature of data as it is produced in many real-world applications. Stream-oriented systems, on the other hand, process data as it arrives and thus are oftentimes the more natural fit. A great number of stream processors have emerged over the last years and all are advertised as highly available, fault-tolerant and horizontally scalable. But where do these systems differ and which is the right one for a given use case?
    In this talk, we give an overview of the state of the art of stream processors for low-latency Big Data analytics and conduct a qualitative comparison of the most popular contenders, namely Storm and its abstraction layer Trident, Samza, Flink and Spark Streaming.
    We first cover how stream processing frameworks differ from batch-oriented systems (e.g. Hadoop and Spark) and how they are typically employed (Lambda & Kappa Architecture). We then go into detail on each system and inspect their respective rationales, guarantees, and trade-offs. As an illustrative example we will cover real-time machine learning use-cases.
    Felix Gessert is CEO and co-founder of Baqend. Baqend develops a cloud backend to help programmers build instantly-loading websites with a novel caching algorithm.
    Felix received his master of computer science from the University of Hamburg and founded Baqend in 2014 with fellow students. His PhD thesis is concerned with the technical foundations of Baqend. His major interests are scalable database systems, transactions, web technologies for cloud data management and steaks.
    Felix is passionate about leveraging and improving NoSQL systems for web applications. He frequently talks and writes about the related challenges and organizes a conference series on cloud databases.
  • Věda a technologie

Komentáře • 7

  • @HarshTandon-kn6md
    @HarshTandon-kn6md Před 7 měsíci

    Great talk. Offers a very insightful view of considerations for a stream processing system. Great roundup on all major streaming platforms.

  • @sittapongsettapat
    @sittapongsettapat Před 6 lety +5

    This is a very good summary

  • @EMAILSANJEEVJOSHI
    @EMAILSANJEEVJOSHI Před 5 lety

    very lucid explanation. great job!

  • @sridharmanickam9303
    @sridharmanickam9303 Před 4 lety +1

    Where does Apache Druid fit in?

  • @chuckb5860
    @chuckb5860 Před 6 lety

    Also take a look at Wallaroo: Ultrafast and elastic data processing engine. www.wallaroolabs.com/community

  • @adityagarimella4723
    @adityagarimella4723 Před 2 lety

    very bad voice but good presentation