ETL Is Dead, Long Live Streams: real-time streams w/ Apache Kafka

Sdílet
Vložit
  • čas přidán 22. 02. 2017
  • InfoQ Dev Summit Boston, a two-day conference of actionable advice from senior software developers hosted by InfoQ, will take place on June 24-25, 2024 Boston, Massachusetts.
    Deep-dive into 20+ talks from senior software developers over 2 days with parallel breakout sessions. Clarify your immediate dev priorities and get practical advice to make development decisions easier and less risky.
    Register now: bit.ly/47tNEWv
    ----------------------------------------------------------------------------------------------------------------
    Neha Narkhede talks about the experience at LinkedIn moving from batch-oriented ETL to real-time streams using Apache Kafka and how the design and implementation of Kafka was driven by this goal of acting as a real-time platform for event data. She covers some of the challenges of scaling Kafka to hundreds of billions of events per day at Linkedin, supporting thousands of engineers, etc.
    Download the slides & audio at InfoQ: bit.ly/2ldN6P0
    This presentation was recorded at QCon San Francisco 2016. The next QCon is in London, March 5-7, 2018. Check out the tracks and speakers: bit.ly/2hxsoN1
    For more awesome presentations on innovator and early adopter topics check out InfoQ’s selection of talks from conferences worldwide: bit.ly/2lRQCll
  • Věda a technologie

Komentáře • 105

  • @cenai61983
    @cenai61983 Před 5 lety +97

    Very good introduction to streaming ETL architecture and Kafka. Misleading title. Streaming ETL is just another way of implementing ETL. Traditional batch-oriented ETL doesn't have to be totally replaced by Streaming ETL.

  • @IntrepidClown
    @IntrepidClown Před 5 lety +67

    Introduction to Kafka really starts at 17:36.

    • @tonybernoulli7859
      @tonybernoulli7859 Před 5 lety +7

      Comments like this are helping this world become better place

    • @mwandulu
      @mwandulu Před 5 lety +1

      At a speed of 1.25 too

    • @Techie-in-Cloud
      @Techie-in-Cloud Před 4 lety +1

      @@mwandulu You can go to 1.5 too with very little difference. :)

    • @bernardlowe5433
      @bernardlowe5433 Před 4 lety +6

      For me the whole talk was pretty good. See no reason to skip.

    • @oluwoleoyekanmi6052
      @oluwoleoyekanmi6052 Před 3 lety +2

      No reason to skip. The preamble puts things into context.

  • @niranchanadevirajmohan3232

    This was a well thought out presentation by sharing a brief introduction of existing systems, their limitations. And transitioning to the need for kakfa, the way it is designed and also explaining how the limitations are addressed by Kafka. Good one.

  • @smyk1975
    @smyk1975 Před 7 lety +11

    Great architecture overview of Kafka Streams. Convinced me to look deeper into the Streams API and capabilities.

  • @gcbzzzz
    @gcbzzzz Před 5 lety +17

    "event and batch have tradeoffs. now ignore the trade offs and try to use streams for everything" :/

  • @MrNau007
    @MrNau007 Před 6 lety +4

    2 Observantions:
    1) History of ETL - missed the entire evolution of data warehouse from MIS systems
    2) example of old and new “T”. You applied “remove PII fields” at streaming platform . Who will identify what is this common transformations which would have to be applied at streaming platform.
    One benefit is : one higher level of abstraction

  • @sharathchandra5314
    @sharathchandra5314 Před 5 lety +2

    Nice Presentation...I would like to know what Vendors of ETL Tools like Informatica, DataStage ..etc., has to say about their products in the sense of this briefing..bec these two are quite busy in coming up with new versions.

  • @filipedelbel
    @filipedelbel Před 6 lety +1

    Very clarifying explanation about Kafka, helped me a lot to understand the concept.

  • @jocalvo
    @jocalvo Před 5 lety +27

    ETL's are not dead, they just transformed. The KEY is not apache kafta, the key is DATA ARCHITECTURE, otherwise it will add more mess.

  • @msambare
    @msambare Před 6 lety +3

    Great presentation and talk. Now I want to explore streaming platforms in detail.

  • @babylon_bob
    @babylon_bob Před 5 lety +12

    I think instead of saying ETL is dead, just say I have not clue. I've never in my life recreated two streams to process the same data into different destinations (12:59) I'd do exactly the same as at (13:29) but with ETL tools.

    • @prashanthtalla
      @prashanthtalla Před 5 lety +2

      I agree. If step 2 is same for both the destinations, why will you repeat for Cassandra. You'll just add that destination also to the load logic of the existing ETLs.
      I wish the speaker gave better example where we may end up doing this and how streaming could have helped.
      I believe streaming is advantageous from cost perspective (ETL tools are super expensive) and for real time very large volumes, they cannot scale. I'm also not sure if streaming really solves this problem - I've yet to work on streaming technologies.

    • @sanchitkumar9862
      @sanchitkumar9862 Před 5 lety +2

      Absolutely True, No one is dumb enough to run the computation twice when we have the option of adding the data to multiple destinations.

  • @IliaTernovich
    @IliaTernovich Před 6 lety

    31:56 link to video please. Unfortunately can't hear names clearly

  • @im2crazyin
    @im2crazyin Před rokem +3

    Very informative, precise and too the point introductory talk on data streams. It gives enough information that one knows why and when to look for streaming solutions and one also knows what specific areas to dig in for once they decide to go for such solution.

  • @renatoalencar4451
    @renatoalencar4451 Před 5 lety +15

    So many angry comments. It's just an attractive title, not an actual PhD thesis.

  • @ericpham6192
    @ericpham6192 Před 5 lety

    Can parallel processing in bandwidth fill multiple packages help in big data and distributed database and buffering work in hand in hand help in streaming. Also 3d volume fill data storage and extracting data format

  • @Ravi86055
    @Ravi86055 Před 6 lety +3

    Great content... useful information

  • @flynntsang
    @flynntsang Před 6 lety +5

    This is an intelligent and articulate overview of how Kafka in particular manages increasing volume, velocity and variety of "big data" using real-time streams. It may not resonate with everyone; not everyone needs this. Excellent for those getting started with streaming data and transitioning away from messaging queues or redundant ETL processes.

  • @chakrapanireddy1358
    @chakrapanireddy1358 Před 7 lety +2

    Really helpful.. Nice explanation..

  • @arunasjunevicius533
    @arunasjunevicius533 Před 6 lety +9

    Really? Data integration and Application integration is not the same. ETL and EAI solve two totally unrelated problems. And how can one say that MQ does not scale when if one want's to scale he can choose DDS or whatever different messaging technology.

  • @abobakrnasr9814
    @abobakrnasr9814 Před 3 lety

    Wonderful talk...than you so much Neha for the presentation.

  • @chandraprakashmatam672

    Though it is not paradigm shift, the approach given here eventually modern EDW with real time streams.

  • @susmitdey9172
    @susmitdey9172 Před 4 lety +2

    ETL and EAI probably addresses different problems compared to streaming, practically according to me streaming is more of using capabilities of the platform to integrate rather than using a tool to do ETL or Real-Time it addresses the data transfer logic so we can avoid tools, correct me if I'm wrong.

  • @audreymciver4863
    @audreymciver4863 Před 5 lety +1

    all Principles should be implemented in any streaming data to be in compliance at all time.

  • @MrLyonliang
    @MrLyonliang Před 5 lety +3

    Thanks a lot for explaining clearly about: what happened yesterday, what's the pain point, what's the new requirement, and HOW.

  • @navalsaini
    @navalsaini Před 5 lety +2

    A very well structured talk. Thanks for it. :-)

  • @dx4816
    @dx4816 Před 5 lety +2

    The "messy" diagram can simply be redrawn to match the Kafka-based diagram. Lots of good information, but the real differentiate is not the integration patterns. Anyway, Kafka is a great product.

  • @ericpham6192
    @ericpham6192 Před 5 lety

    Share distribute processing by using percentage of iddling resource in cloud sharing processing network

  • @sreeRocksRocks
    @sreeRocksRocks Před 6 lety

    Great video and gave overall idea on what Kafka is and how to play with it in real use cases. Excellent and kudos!

  • @manojsembekar5703
    @manojsembekar5703 Před 6 lety +1

    great thanks for information..

  • @xfactor740501
    @xfactor740501 Před 5 lety

    Great presentation. She makes it look simple...Does anyone know the program used to create the presentation?? I like the look, as though was drawn ""free-hand"....very sharp

  • @anant3104
    @anant3104 Před 5 lety +1

    Great and it is very helpful, thank you

  • @michalmefli
    @michalmefli Před 7 lety +2

    Great talk.

  • @RicardoMontee
    @RicardoMontee Před 4 lety +1

    7:30 "ETL (Extract Transform Load) and EAI (Enterprise Application Integration) are outdated"

  • @kevinshoang
    @kevinshoang Před 3 lety +1

    April 2021, Batch is still more popular than stream.

  • @anuragakella
    @anuragakella Před 5 lety

    Awesome presentation skills.and clear explanation about ETL changing from batch to Real -Time

  • @veerun3104
    @veerun3104 Před 6 lety +14

    ETL is not only meant for data integration.. what about business intelligence and analytics apps..

    • @8Trails50
      @8Trails50 Před 5 lety +1

      I think they are saying ETL in the form of ingestion of data INTO some tool. Not Spark or Hadoop jobs. In that case you could just subscribe to Kafka.

  • @piggybox
    @piggybox Před 5 měsíci

    6 years later, ETL is still alive

  • @gauravaithmia
    @gauravaithmia Před 3 lety +2

    First 15 minutes are more like a pitch deck dumbed down for a VC.

  • @tansudasli
    @tansudasli Před 5 lety +1

    data integration and service integration layer are handled by different products on the market. that's the main problem. and it is good to see them in a convergent approach. that's why Kafka is on the spot.
    this convergence brings organizational effectiveness to enterprise. because you can now combine BI's ETL team and Middleware team, so you can get holistic integration capabilities which will also creates advantage point for transformation.
    on the other hand, scalability is a relative concept. in an enterprise, EAI or ESB is scalable. ETL is batch oriented but it is feasible for an enterprise's near realtime concerns.

  • @davidk7212
    @davidk7212 Před rokem

    Not all data is big data, and all data will never be all big data. There will always be a huge place for standard ETL.

  • @jersute
    @jersute Před 7 lety +14

    the T in ETL has nothing to do with scrubbing ('data cleaning') or normalization. if you're using ETL to scrub you're already too late in the pipeline and using a hammer as a screwdriver when you want a paintbrush. it's gibberish. ETL is for data snapshots to move between environments where you want only a subset of the data but it is transactionally stable. ETL is how you leave the house. Kafka is the road you drive on to deliver the payload from said house. different topics.
    Kafka should be viewed as a simd replacement for amqp/zmq or as she has presented it a comparison vs elk for log processing as a limited use case. the streams discussion should be compared with apache storm for analytical capability or a distributed replacement for memcached performance counters. local state is a poor way of saying cache locality and migration.
    this talk is all over the place. no mention of the problem of dealing with subaggregation and priority dependency issues inherent in kafka/storm without explicit payload tagging or reentrant use of the architecture in general as befits any simd speedup discussion.
    if you are familiar with the concepts of noshared architectures for data presentation and want a messaging solution with the same principles then kafka may interest you. do not expect magic.

  • @zisispontikas2038
    @zisispontikas2038 Před 5 lety +2

    36:50 come on. You just took the stream processing java app and the dashboard app and put them inside in one application. So the database is inside kafka and the job processing and dashboard are merged. There should have been 2 boxes not 1

  • @MM-zd8sx
    @MM-zd8sx Před 4 lety

    Very helpful video! Quality content and great presentation. Stellar job

  • @md.mottakinchowdhury7898
    @md.mottakinchowdhury7898 Před 2 lety +1

    Misguiding. Why would batch processing be dead if it is just enough to do batch processing of your data?

  • @wennwenn1422
    @wennwenn1422 Před 5 lety +6

    this butthurted all ETL folks..

  • @allanhouston22
    @allanhouston22 Před 5 lety +7

    Kafka is not a ETL replacement, it is a streaming/message broker. ETL is a platform that offers adapters for receiving and writing data from/to multiple source/destination types (files, DBs, queue systems), its a centralized mapper tool (say XMLCVS), and supports various integration patterns (best practices). So ETL can typically be used to read/write from/to Kafka while it is performing mapping so that the destination system understands what the source system is trying to send, in real-time. EAI systems, another platform type she mentioned, are particularly written for event/real-time purposes so its more suitable platform for such type of work as it supports transactional behavior and unified monitoring of what is flowing through it, in addition to adapters and centralized mapping.
    How this woman managed to compare oranges and apples without receiving more down votes it beyond me.

  • @void0818
    @void0818 Před 6 lety +3

    E --(k)-- T --(k)-- L this is where the kafka in ETL, ETL will never dead but kafka is a good stream used in ETL processes.

  • @blobbyflobby6752
    @blobbyflobby6752 Před 3 lety +2

    ETL is dead. Long live ETL!

  • @allmhuran
    @allmhuran Před 5 lety +9

    ETL is outdated? That's news to any company that has no need to process terabytes of data in real time.
    This is the problem with keynotes from super giant companes. They only speak from the perspective of a super giant company. The overwhelming majority of enterprises do not have scale problems in this category, but people from such companies walk out of the keynotes thinking "yeah, this is what we should do!". No, you probably shouldn't.

  • @audreymciver4863
    @audreymciver4863 Před 5 lety

    im only using this to identify any hackers uploading anything of any kind. number one it was without my permission. hacking is a federal offense.and it violates my privacy rights.

  • @pajeetsingh
    @pajeetsingh Před rokem

    By Mark 5:00 you'd figure out all the shenanigans regarding streams, data integration and why these corporate tech lords created Kafka. Good presentation.

  • @pajeetsingh
    @pajeetsingh Před rokem

    Just use dma.

  • @vikramachandranselvakumar6316

    The speaker has no inkling of what ETL is or what a Datawarehouse is and how they are architected, designed, developed, provisioned and sustained. Apache Kafka is great open source tool for integrating streaming data into your data lake and is not a paradigm that will replace technology agnostic paradigm name ETL. I have used Spark SQL to accomplish/realize a ETL based solution. Again Spark SQL is a tool and not a paradigm.

  • @robinsoncarter3432
    @robinsoncarter3432 Před 6 lety

    hello you use chroma key in this video?

  • @podunkman2709
    @podunkman2709 Před 4 lety +7

    You confuse two loosely connected areas. Kafka is NOT the successor to ETL. ETL is a completely different group of products with a completely different application. Kafka may be the next generation of ESB. In addition, you must know that in the vast majority of companies around the world their "even driven architecture" is MS Excel. Why companies like Google or Faceook have their power? Because they are really unique. Meanwhile most of companies do things like 20 years ago. For them ETL is miracle. They do not need any Kafka. It's beyond their perception.

  • @chandanjha3205
    @chandanjha3205 Před 4 lety +1

    It was a nice presentation but majority of data generated by user actions are still stored in databases(SQL,Oracle) and thus ETL tools like SSIS are still needed to read them and send processed data to destinations. Some data could be in flatfiles but not too often seen these days unless we are gathering from multiple public sources. Whenever I try to read into the minds of speakers in youtube presentations to see why they are using Kafka or Spark, all they give is an example of 'word count' which is sad. Take an example of Spark, sure it can do distributional computing but so can a lot of other tools too if you have an array of cheap servers.

  • @sanchitkumar9862
    @sanchitkumar9862 Před 5 lety +3

    It's a very harsh statement to say ETL is dead. No, ETL is not dead.

  • @saurabh3614
    @saurabh3614 Před 7 lety +2

    this is not at all comparable, Both meant for different purpose. I doubt if she has ever looked at the DWH code and design .And bet you if you show me one single implementation which include complete fact table design to solve customer business problem

    • @Ranjan316
      @Ranjan316 Před 6 lety +1

      Saurabh u are right, if u look at her work history she worked for just 1 company ( linkedin) and took kafka out as a new company, she is trying to just make money out of that......she has no idea why facts and dimensions are needed, you add any stream someone needs to transform them into data which data analysts or data scientists can use,

  • @nareshgb1
    @nareshgb1 Před 6 lety

    elsewhere:
    czcams.com/video/4CkRewmRnRc/video.html

  • @chrisl.9750
    @chrisl.9750 Před 3 lety +1

    ETL is not dead and if you want to be taken seriously in the world of data, I recommend you drop this suggestion...

  • @dataguygamer
    @dataguygamer Před 6 lety +2

    Trolling title... I'm not sure if the speaker would approve of this title. It opens her idea for ridicule

  • @MelvinStudios
    @MelvinStudios Před 4 lety +2

    Do you even know what "dead" means? ETL is used in many companies. Therefore ETL is not dead. Floppy disk is dead.

  • @attilaviniczai7215
    @attilaviniczai7215 Před 3 lety

    I love how americans can make acronyms out of the most important words in a title and just assume everyone knows what they abbreviate. It always amazes me how they try to get thoughts across an audience with a bunch of these 3 letter, context specific, magic words flying around.

  • @Yi5Zhou
    @Yi5Zhou Před 3 lety

    you don't have to use this kind of name to attract viewers

  • @VoxNerdula
    @VoxNerdula Před 5 lety +2

    I vant to try her curry

  • @debashishroy3485
    @debashishroy3485 Před 6 lety +1

    I think you are HR rather Technical ...from you scrap it is clear that you don't know both hadoop and ETL

    • @NothingMatress
      @NothingMatress Před 6 lety +3

      Think again.

    • @Ranjan316
      @Ranjan316 Před 6 lety +1

      She is clueless i am shocked she is even allowed to talk at a summit

  • @IA-xh5ly
    @IA-xh5ly Před 6 lety +3

    From what I’ve heard from this lady I’m making assumption she has a very little experience in ETL development (manual validation for example), she just follows the modern fashion.

    • @Ranjan316
      @Ranjan316 Před 6 lety

      Igor Andriychuk yup, lets see how much longer silicon valley supports such scam artists in the name of VC funding....

  • @KC-zn4gt
    @KC-zn4gt Před 6 lety +3

    It's a shame someone knows on just one tiny topic thinks she knows how it works and applies for all. On the final diagram there an icon of a DWH, I wonder how she explains how that DWH is getting populated without ETL. Oh...she probably thinks that is readymade available for her to stream from. lol.

    • @onlyitj
      @onlyitj Před 6 lety

      You will be subscribe to multiple topics, and using Stream API process those message, which can potentially do the job.

  • @debashishroy3485
    @debashishroy3485 Před 6 lety +1

    bullshit ...I don't know which platform give these people to open their mouth even they don't have clear knowledge this shows the quality of Indian IT managers and Leaders

    • @Ranjan316
      @Ranjan316 Před 6 lety +1

      Completely agree , kafka is nice technology but this person doesn’t seem to have any idea about enterprise architecture or problems ETl tries to solve.....

  • @20cmusic
    @20cmusic Před 6 lety +3

    2018. ETL is still alive. I really hate this kind of marketer style shitty title.

  • @nguyen4so9
    @nguyen4so9 Před 7 lety +13

    Crap talks. ETL is a concept that is always there.

    • @TheEnfernuz
      @TheEnfernuz Před 6 lety +8

      She doesn't deny it in the talk actually. She says that the batching ETL is dead / outdated, and now the streaming ETL is a way to go. Though I agree that part of the title is a bit misleading.

    • @temaz3334
      @temaz3334 Před 6 lety +12

      Shitty comment. U dont even understand what she is talking about.

    • @flipper71100
      @flipper71100 Před 6 lety +5

      People always have a tendency to resist change, as a result, they don't listen carefully

    • @jcrshankar
      @jcrshankar Před 6 lety +1

      she mentioned etl tools not concept

    • @Ranjan316
      @Ranjan316 Před 6 lety +3

      Shankar K the title says ETL is dead, she is dumb as a rock....

  • @darshansangodkar6173
    @darshansangodkar6173 Před 6 lety

    I wish this presentation was given by some techie guy.

    • @tinameh
      @tinameh Před 4 lety +1

      Darshan Sangodkar really? I’d like to actually hear your tech talk some day. Do pick a deeply technical topic please. And an original one while you’re at it. If you struggle with that though, drop me a word. Happy to share some tips.

    • @atulavhad1661
      @atulavhad1661 Před 4 lety

      @@tinameh I guess many are not aware that she was amongst ppl who built Kafka, I have seen her other talks and I found those enlightening and also built a unicorn startup.

  • @b4bhanu
    @b4bhanu Před 5 lety

    click bait title... kafka is great but this talk is a disaster

  • @ShivaKumar-ps1vh
    @ShivaKumar-ps1vh Před 3 lety

    Not worth......

  • @rajeshn5829
    @rajeshn5829 Před 6 lety

    U r relly pretty

  • @jianhuang7993
    @jianhuang7993 Před 6 lety +9

    This talk is a disaster

  • @danpal6737
    @danpal6737 Před 5 lety +1

    Rubbish material, holy cow letter from india

  • @MalleusDei275
    @MalleusDei275 Před 2 lety

    Lol,
    A silica nigre....
    😉

    • @MalleusDei275
      @MalleusDei275 Před 2 lety

      Your mum should have advice to you for dont play with the Hammer...
      Yes,
      Tyrannosaurus burgers were greeeeeeat.

  • @msftora3
    @msftora3 Před 5 měsíci +1

    BullSsssssssst