Comparing Kafka Streams, Akka Streams and Spark Streaming: what to use when | Rock the JVM

Sdílet
Vložit
  • čas přidán 7. 09. 2024

Komentáře • 40

  • @abdulelahaljeffery6234
    @abdulelahaljeffery6234 Před 3 lety +1

    I really love how you lay out the pros and cons of each streaming API, and in what situation we have to use what. Really great stuff; and I'm glad that I found your channel.
    I'd happily buy a membership to learn from your awesome courses.
    Cheers pro :)

  • @carlosvaztec
    @carlosvaztec Před rokem +1

    One thing I missed was STATE, how they compare in terms of managing aggregations. Great video thank you.

  • @marekiwaniuk2399
    @marekiwaniuk2399 Před 3 lety +3

    Just wanted to leave a note on how Reactive Manifesto and Reactive Streams are (not) related to each other. The first one describes 'reactive systems' - it means the whole system, where all of its components cooperate in a resilient, elastic, fault-tolerant and message-driven manner. So it is a specification of how a system should behave as a whole. Reactive streams, on the other hand, are just a piece of the puzzle in the reactive system. They also can be used separately, outside of reactive system. The thing is, you can actually write an application, which doesn't comply with requirements of Reactive Manifesto, but still uses and leverages Reactive Streams. 'Reactive' in systems means how the whole system reacts to volume, load, errors etc.; 'reactive' in streams means that you have a flow of data, and you react asynchronously to the events in this flow. In the world of Akka those two terms might get blurred, because Akka Actor system actually enables you to build a reactive system. Nonetheless, I would say that Akka Streams might help you build a reactive system, but they won't make your system resilient, elastic, etc. straight away.
    Anyways, you have a really good content on this channel, thanks a ton for that!

    • @rockthejvm
      @rockthejvm  Před 3 lety +1

      Thanks for adding the color there. I might make a video on this exact topic. Glad you like my work!

  • @ElectricWound
    @ElectricWound Před 2 lety +2

    A very nice high-level overview of the differences of the streaming libraries. I was especially looking for a description of when to use Kafka Streams instead of Akka Stream and this was very helpful. There was one severe error in your description of Akka Streams though. They are not "asynchronous by default". Most operators are actually synchronous and you are able to introduce asynchronous boundaries into streams or invoke asynchronous operations with a given degree of parallelism. Consecutive synchronous operations will be "baked" into a single actor transparently on materialization to minimize message passing overhead. So you have perfect and concise control over the concurrency of calculations. And I just can not fully agree on your position on Akka Streams as being especially hard for beginners. Especially programmers with some Scala experience will quickly relate to the collections-like API and be up and running in no time, especially compared to setting up Kafka or Spark. I think, before anyone approaches streaming libraries at all, they are probably already knee deep in hard to solve concurrency, dependency and performance problems and maybe sunk weeks into cracking each problem the hard way. Then finding Akka Streams you can finally concentrate on your logic, get all the boilerplate out of the way and write some self-descriptive concise code, that rocks some incredibly complex stuff, nicely modularized in readable code chunks that fit on a single screen. Its discovery for me was like finally coming home. I think, the hardest part is wrapping your head around the concept of materialized values, how to design stream stages with state correctly and when you need the Graph API at all. My next task is getting my hands dirty with Kafka.

  • @alexandrutoma9187
    @alexandrutoma9187 Před 3 lety +1

    esti cel mai bun instructor de scala din lume :D
    ce bine ca esti si pe udemy si ai si cursuri pe site.
    tot asa Daniel!

  • @Dr_Dude
    @Dr_Dude Před 3 lety +1

    Nice, finally i know the difference and when to use what!!!! well done video as always

  • @iQwert789
    @iQwert789 Před 4 lety +4

    Good video, however it was nice if you could also include Flink (as you comparing streaming frameworks) it's generally 20% faster than Kafka Streams and Spark Streaming, probably Kafka streams is the future as Kafka's ecosystem is evolving, but syntax vice Spark/Flink are much more intuitive in Scala

  • @namanbhayani1016
    @namanbhayani1016 Před 2 lety +1

    Thank you very much Daniel :)

  • @ziauddin5981
    @ziauddin5981 Před 4 lety +1

    Nice explanation. Can we also include a part of Apache Flink. Apache Flink,as i think , also uses Akka under the hood (?) and it also provides some good control over stream through low level APIs and other benefits as shown for akka.

  • @chandrashekharkotekar8453

    Thanks for this detailed video. Can you please make similar video which compares Spark streaming with Apache Flink with Apache pulsar?

    • @rockthejvm
      @rockthejvm  Před 4 lety

      Noted!

    • @felipegutierrez7856
      @felipegutierrez7856 Před 3 lety +2

      good request. I was going to say that. The video offers a very good explanation about the 3 stream libraries/frameworks. I would say that Flink offers a better low latency if compared to Spark since Flink follows the process-at-a-tuple model and Spark uses micro-batching. Backpressure in Flink is per operator and in Spark is on the source. Akka-streams is also per operator, which is very good! i loved Akka-stream because I can change the strategy of one operator at runtime using Flows and Partition. If I would do it in Flink and Spark I will need to save the state and restart the stream query.

  • @cgmds1973
    @cgmds1973 Před 4 lety +2

    Awesome explanation, thank you!!

  • @stanislavg.7903
    @stanislavg.7903 Před 3 lety +1

    Cool. But now (from 2.3) Spark has .trigger(processingTime = "0 seconds") to minimize the latency. We may use a 0 second processing time trigger indicating that Spark should start each micro-batch as fast as it can with no delays.

    • @rockthejvm
      @rockthejvm  Před 3 lety

      Yep. Did that come into conflict with anything in the video?

  • @danishamjad5807
    @danishamjad5807 Před 2 měsíci

    I am guessing ZIO streams is analogous to Akka streams w.r.t usage. right?

  • @LucaSavoja
    @LucaSavoja Před 4 lety +7

    Awesome video as always. I'd love a course (on udemy, not free!) of kafka/kafka streams. The other one on udemy are not as good as yours.

  • @Pl4sm4feresi
    @Pl4sm4feresi Před 4 lety +1

    I love your videos bro

  • @minshi1040
    @minshi1040 Před rokem

    Hi Daniel,
    Normally, how would you host the scala applications to make it long running process if you use Kafka Streams ?
    I know if I use spark streaming, the dedicated cluster will keep it running and listen /react to the stream/data. I have not big amount of data.
    Kind Regards

    • @rockthejvm
      @rockthejvm  Před rokem

      There are various cloud services for Kafka to help you with the Kafka cluster.

  • @tai-hao-le
    @tai-hao-le Před 2 lety

    Could you please clarify what do you mean by fault tolerance in Akka Streams? I am used to working with big data frameworks (Kafka Streams, Spark Streaming and Flink) and they usually execute code on flock of machines with exceptional horizontal scalability and fault tolerance. I lack the information on Akka Streams side - from your description (best for high-performance streams that are part of the business logic) I would assume that we embed Akka Streams application into existing ones. That could give us superior vertical scalability (with concurrency backed by actors) but if that's just a single machine then how on earth can we talk about fault-tolerance? I must be missing something obvious :)

    • @rockthejvm
      @rockthejvm  Před 2 lety

      Maybe a subject for a future video

  • @lsitful
    @lsitful Před 3 lety +2

    +1 for: why Flink is not here?

  • @dimfatal7259
    @dimfatal7259 Před 3 lety

    Hey, Daniel, I’m absolutely beginner and I have question about fs2 library which also using for some kind of streaming. My question is - could it be alternative for some of the streaming library’s that you mentioned in this video?

    • @rockthejvm
      @rockthejvm  Před 3 lety +2

      FS2 is a streaming library that's better used for application logic rather than data processing. Also it's quite hard for beginners.

  • @Pl4sm4feresi
    @Pl4sm4feresi Před 4 lety

    Is There any discount associate with your yearly full access membership? Here in Brazil things are complicated. Dollar is almost 6 times our currency.

    • @ziauddin5981
      @ziauddin5981 Před 4 lety

      Hi Victor, Try RockTheJVM courses on Udemy.

    • @rockthejvm
      @rockthejvm  Před 4 lety +2

      In the process of creating some location-adjusted optional discounts because I know things are unequal across the world

  • @_slier
    @_slier Před 2 lety

    but i hate jvm related technology.. so, do i have any other choices? or just suck it up?

  • @menphalla
    @menphalla Před 4 lety

    Hello. :-)