Kai Wähner
Kai Wähner
  • 54
  • 411 690
The Shift Left Architecture
Data integration is a hard challenge in every enterprise. Batch processing and Reverse ETL are common practices in a data warehouse, data lake or lakehouse. Data inconsistency, high compute cost, and stale information are the consequences.
This video introduces a new design pattern to solve these problems: The Shift Left Architecture enables a data mesh with real-time data products to unify transactional and analytical workloads with Apache Kafka, Flink and Iceberg.
Consistent information is handled with streaming processing or ingested into Snowflake, Databricks, Google BigQuery, or any other analytics or AI platform to increase flexibility, reduce cost and enable a data-driven company culture with faster time-to-market building innovative software applications.
Table of Contents:
00:26 - Data Products Business Value
02:14 - Batch ETL and ELT in the Lakehouse
04:36 - Shift Left Architecture
07:24 - Shift Left with Kafka, Flink and Iceberg
More details about the Shift Left Architecture:
www.kai-waehner.de/blog/2024/06/15/the-shift-left-architecture-from-batch-and-lakehouse-to-real-time-data-products-with-data-streaming/
zhlédnutí: 331

Video

When NOT to use Apache Kafka?
zhlédnutí 1,8KPřed 4 měsíci
Apache Kafka is the de facto standard for event streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does Kafka have? (no matter if you use the open source framework or a cloud-native data streaming platform or a serverless SaaS cloud service) When does Kafka simply...
GenAI Demo with Apache Kafka, Flink, LangChain, OpenAI
zhlédnutí 3,4KPřed 6 měsíci
Generative AI (GenAI) enables automation and innovation across industries. This live demo explores a simple but powerful architecture and demo for the combination of LangChain with OpenAI LLM, Apache Kafka for event streaming and data integration, and Apache Flink for stream processing. The use case demonstrates how data streaming and GenAI help correlating data from Salesforce CRM, searching f...
Apache Kafka vs. JMS Message Broker (IBM MQ, TIBCO, Solace)
zhlédnutí 6KPřed rokem
Comparing JMS-based message queue infrastructures and Apache Kafka-based data streaming is a widespread topic. Unfortunately, the battle is an apple-to-orange comparison that often includes misinformation and FUD from vendors. This video explores the differences, trade-offs, and architectures of JMS message brokers and Kafka deployments. Learn how to choose between JMS message brokers like IBM ...
A Hybrid Cloud-native Lakehouse Project for Predictive Maintenance
zhlédnutí 672Před 2 lety
Building Cloud-native Data Warehouses and Data Lakes with Data Streaming - Part 3: A Hybrid Cloud-native Lakehouse Project for Predictive Maintenance. The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Storing data at rest for reporting and analytics requires different capabilities and SLAs than continuously proces...
What is a Lakehouse? Data Streaming and Batch Analytics.
zhlédnutí 1,5KPřed 2 lety
Building Cloud-native Data Warehouses and Data Lakes with Data Streaming - Part 2: Data Lakehouse for Streaming and Analytics. The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Storing data at rest for reporting and analytics requires different capabilities and SLAs than continuously processing data in motion for ...
Data Analytics at Rest vs. Data Streaming in Motion
zhlédnutí 1,5KPřed 2 lety
Building Cloud-native Data Warehouses and Data Lakes with Data Streaming - Part 1: Data Analytics at Rest vs. Data Streaming in Motion. The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Storing data at rest for reporting and analytics requires different capabilities and SLAs than continuously processing data in mo...
Apache Kafka vs. iPaaS ETL Middleware
zhlédnutí 1,6KPřed 2 lety
Enterprise integration is more challenging than ever before. The IT evolution requires the integration of more and more technologies. Applications are deployed across the edge, hybrid, and multi-cloud architectures. Traditional middleware such as MQ, ETL, ESB does not scale well enough or only processes data in batch instead of real-time. This video explores why Apache Kafka is the new black fo...
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
zhlédnutí 2,1KPřed 2 lety
The manufacturing industry is moving away from just selling machinery, devices, and other hardware. Software and services increase revenue and margins. Equipment-as-a-Service (EaaS) even outsources the maintenance to the vendor. This paradigm shift is only possible with reliable and scalable real-time data processing leveraging an event streaming platform such as Apache Kafka. This talk explore...
Kappa vs Lambda Architectures and Technology Comparison
zhlédnutí 9KPřed 2 lety
Real-time data beats slow data. That’s true for almost every use case. Nevertheless, enterprise architects build new infrastructures with the Lambda architecture that includes separate batch and real-time layers. This video explores why a single real-time pipeline, called Kappa architecture, is the better fit for many enterprise architectures. Real-world examples from companies such as Disney, ...
Cloud-Native 5G, MEC and OSS/BSS/OTT Telco with Apache Kafka and Kubernetes
zhlédnutí 6KPřed 2 lety
Apache Kafka and Kubernetes in the Telco Industry - Cloud-Native 5G, MEC and OSS/BSS/OTT Real-World Use Cases. This session explores architectures and use cases for event streaming with the open-source framework Apache Kafka in the Telco industry. Telcos modernize their edge and hybrid cloud infrastructure with Kafka and Kubernetes to provide an elastic scalable real-time infrastructure for hig...
Panel Discussion about Kafka, Edge, Networks and 5G in Oil & Gas and Mining Industry
zhlédnutí 506Před 2 lety
The oil & gas and mining industries require edge computing for low latency and zero trust use cases. Most IT architectures are hybrid with big data analytics in the cloud and safety-critical data processing in disconnected and often air-gapped environments. This panel discussion explores the challenges, use cases, and hardware/software/network technologies to reduce cost and innovate. A key foc...
Apache Kafka in the Insurance Industry
zhlédnutí 972Před 2 lety
The rise of data in motion in the insurance industry is visible across all lines of business including life, healthcare, travel, vehicle, and others. Apache Kafka changes how enterprises rethink data. This talk post explores use cases and architectures for insurance-related event streaming. Real-world examples from Generali, Centene, Humana, and Tesla show innovative data integration and stream...
Apache Kafka in the Retail Industry
zhlédnutí 1,6KPřed 2 lety
Use cases, architectures, and real-world deployments of Apache Kafka in edge, hybrid, and global retail deployments at companies such as Walmart and Target. The retail industry is completely changing these days. Consequently, traditional players have to disrupt their business to stay competitive. New business models, great customer experience, and automated real-time supply chain processes are ...
Apache Kafka in the Automotive Industry
zhlédnutí 2,1KPřed 3 lety
Data in Motion powered by Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4.0, Mobility Services, Smart City). Slides for this talk: www.slideshare.net/KaiWaehner/apache-kafka-in-the-automotive-industry-connected-vehicles-manufacturing-40-mobility-services-smart-city Connect all the things: An intro to event streaming for the automotive industry including connected ca...
Apache Kafka for Oil & Gas, Smart Grid, and Energy Utilities
zhlédnutí 1,8KPřed 3 lety
Apache Kafka for Oil & Gas, Smart Grid, and Energy Utilities
Can Apache Kafka Replace a Database?
zhlédnutí 8KPřed 3 lety
Can Apache Kafka Replace a Database?
Hybrid Cloud Replication with Apache Kafka between Edge and Cloud - Live Demo
zhlédnutí 1,4KPřed 3 lety
Hybrid Cloud Replication with Apache Kafka between Edge and Cloud - Live Demo
App Modernization and Hybrid Cloud Architectures with Apache Kafka
zhlédnutí 1,3KPřed 3 lety
App Modernization and Hybrid Cloud Architectures with Apache Kafka
Augmented Reality Demo with Apache Kafka
zhlédnutí 681Před 3 lety
Augmented Reality Demo with Apache Kafka
Supply Chain Optimization with Apache Kafka
zhlédnutí 1,6KPřed 3 lety
Supply Chain Optimization with Apache Kafka
Apache Kafka in Manufacturing and Industry 4.0
zhlédnutí 2,9KPřed 3 lety
Apache Kafka in Manufacturing and Industry 4.0
Machine Learning with Apache Kafka without another Data Lake
zhlédnutí 1,1KPřed 3 lety
Machine Learning with Apache Kafka without another Data Lake
End to End Integration from IoT Edge to Serverless Apache Kafka in Confluent Cloud
zhlédnutí 1,7KPřed 3 lety
End to End Integration from IoT Edge to Serverless Apache Kafka in Confluent Cloud
IoT Architectures + Use Cases for Apache Kafka - Consumer and Industrial IoT (IIoT / Industry 4.0)
zhlédnutí 7KPřed 4 lety
IoT Architectures Use Cases for Apache Kafka - Consumer and Industrial IoT (IIoT / Industry 4.0)
Mainframe Integration, Offloading and Replacement with Apache Kafka
zhlédnutí 3,9KPřed 4 lety
Mainframe Integration, Offloading and Replacement with Apache Kafka
Apache Kafka and Machine Learning in Pharma and Life Sciences
zhlédnutí 1,3KPřed 4 lety
Apache Kafka and Machine Learning in Pharma and Life Sciences
Apache Kafka and Machine Learning in Banking and Finance Industry
zhlédnutí 3,5KPřed 4 lety
Apache Kafka and Machine Learning in Banking and Finance Industry
IoT Architectures for a Digital Twin with Apache Kafka and Comparison to other IoT Platforms
zhlédnutí 24KPřed 4 lety
IoT Architectures for a Digital Twin with Apache Kafka and Comparison to other IoT Platforms
Event Streaming and Apache Kafka in the Telecommunications Industry (Whiteboard)
zhlédnutí 2,2KPřed 4 lety
Event Streaming and Apache Kafka in the Telecommunications Industry (Whiteboard)

Komentáře

  • @yusufnar
    @yusufnar Před 13 dny

    I liked that. Quite informative.

  • @juanpineda-montoya
    @juanpineda-montoya Před 24 dny

    This is a great demo, Thanks Kai for showing how the entire workflow produces the results

  • @tomasselnekovic
    @tomasselnekovic Před měsícem

    Interesting and inspiring example. Thank you.

  • @srh1034
    @srh1034 Před 2 měsíci

    great

  • @TheIcecoldorange
    @TheIcecoldorange Před 2 měsíci

    Can we stop saying things are easy to set up?

  • @joseindi744
    @joseindi744 Před 3 měsíci

    Complete biased comparison, not even in one characteristic is comparing purely one broker with another. The real answer to this comparison is that actually it depends on the use case, if you dont mind to have some messages without being processed go to Streaming messaging that performs awesomly. BUT if you need an Enterprise grade guarantee of the delivery of all the messages, JMS architecture is unbeatable

    •  Před 3 měsíci

      I don't disagree. If you just need enterprise messaging, then JMS is great (if the message broker can handle the scale you need and licensing cost makes sense). Kafka and JMS usually service different use cases, though you can also leverage Kafka for messaging (but not the other way round). A more detailed comparison based on 10 characterstics is here (but still similar approach as the presentation): www.kai-waehner.de/blog/2022/05/12/comparison-jms-api-message-broker-mq-vs-apache-kafka/

  • @preethvik4779
    @preethvik4779 Před 5 měsíci

    Is every software, jars and plugins free to use commercial purpose, what ever used in this video to make a connection between mqtt broker to kafka to postgres database?

    •  Před 5 měsíci

      Frankly, I don‘t remember after 5 years. But there are definitely free open source components available for the entire pipeline! Search for open source Kafka Connect connectors.

  • @angelr6772
    @angelr6772 Před 5 měsíci

    🙃 Promo-SM

  • @pinheiroalves6411
    @pinheiroalves6411 Před 6 měsíci

    I was looking for an example like this. Thank you so much. Greetins from Brazil.

  • @Andres-jg8xi
    @Andres-jg8xi Před 6 měsíci

    I am looking for 5G Core integration and confiuration doecs and files. such as Integration and cinfiguration a new node in the 5g core like (CSCF, MGC,IMF,UPF,SMF; ..... ) I appreciate if you can share these documens and integration files

    •  Před 6 měsíci

      I don't think something like this exists. Super complex and every CSP does its own thing. And keeps the solutions confidential.

  • @desilveria6799
    @desilveria6799 Před 6 měsíci

    Good content....thanx bro.. i am new learner of kafka

  • @walidmatta1
    @walidmatta1 Před 7 měsíci

    What do you recommend, KAFKA or ESB in the context of Integrated Manufacturing Operations Management Systems (MOMS) in the oil and gas industry (Refinery)

    •  Před 7 měsíci

      It depends more on the use case and technical requirements, not so much on the industry. I have a dedicated article (and video) exploring the differences between ESB and Kafka: www.kai-waehner.de/blog/2019/03/07/apache-kafka-middleware-mq-etl-esb-comparison/

  • @Purutge
    @Purutge Před 9 měsíci

    top content!!

  • @akbarmunwar5435
    @akbarmunwar5435 Před 9 měsíci

    Great architecture good keep it up

  • @1over137
    @1over137 Před 10 měsíci

    On the less contraversial... I came here to learn and get ideas before moving my tiny little MQTT docker micro-service stacks into K8S from docker-compose. As Java is my day job, I use python there :) However, the lure of Kafka in the same K8S cluster is appealing. Now you have got me started on AI! I use a bespoke, dynamic workflow model of "published interests" for heating control. Those published interests are based on fixed, static targets and boolean presence/scheduling logic. Training an AI model on the data to begin to better predict when the heating WILL be required, rather than relying on "rule engine based" triggers for after the heating is required. I have a long way to go to learn how to model that, but I figure I would start by replaying all the grafana data I have for a few winters and look for instances when the retro-active present system was "late" in that the required temperature was not met for X amount of time. It would then have classifications on when the system did well and when it did not do well. It should then be pivot-able to answer, "should the heating be on now?". Based on it's previous patterns of when the heating has been on but should have been on earlier, it may (the AI portion) then be allowed to over-rule the rules and turn the heating on early. The data from it doing so can then be fed back as well. Trouble is, all that sounds good, but I have no idea how to do it :)

  • @1over137
    @1over137 Před 10 měsíci

    @louisrossmann would love this. It explains clearly the answer to his question, "What the hell is my car doing recording my sexual activity?". Unfortunately the real answer is because, it's not the car, it's the cloud it sends every single event message to that records the periodic bouncing of the suspension and the AI feeding off it identified it as sexual activity. If they don't put it in their privacy policy they get done for recording that information. It still makes any car with this facility something I will never own. Hopefully I can make it for another 30 years or motoring without having a "dumb car" which requires a permenant internet connection to the cloud to function properly. No thanks. If my car works today, I would like it to work tomorrow. I do not want my car to shutdown mid motorway because a Kafka queue has to reparition and rebalance and the cluster crashes.

  •  Před 11 měsíci

    This recording is a lot more than a simple compare of technologies, ... thanks for all the real-world project insides!

  • @mohammadvohra7315
    @mohammadvohra7315 Před rokem

    I want to setup this project in windows subsystem for linux i am getting error in "confluent local start connect" command could you please help me??

  • @contactdi8426
    @contactdi8426 Před rokem

    Great explanation! Thanks

  • @JO-on6ky
    @JO-on6ky Před rokem

    Thank you for your videos. Really helpful way to be introduced to those new (for me) approaches. I am a big fan of visual representations

  • @kss481
    @kss481 Před rokem

    Is Solace good for this hybrid use case?

    •  Před rokem

      As alwasy, it depends (but the short answer is that messaging alone is usually not sufficient for a reliable and scalable data synchronization across hybrid and cloud environments). You might check the following video about "message broker vs. data streaming" to better understand the difference between a message broker like Solace and a data streaming platform like Confluent: czcams.com/video/VA3NR5s-AQQ/video.html

  • @shivanshudubey8832
    @shivanshudubey8832 Před rokem

    It's one of the best apple to orange comparison ever witnessed.

  • @nageshwarburman8819

    I thoroughly enjoyed your 3 part series on data streaming and its role in building data warehouses. Thanks! Also, it would be great if you could attach in the description the links to the blogs/images that you used in the presentation.

  • @jptech__
    @jptech__ Před rokem

    I loved thanks for explaining this very simple way

  • @BryanChance
    @BryanChance Před rokem

    Such a great point that other industries system need to work for decades as compared to the virtually fly-by-night operations of many of today's "apps". Software is just not made to work for the long run. With the exception of Java enterprise applications, in my opinion. :) It's like the saying "they don't make them like they used to". But maybe we've come full circle. Great video. Thank you

  • @cemakpolat422
    @cemakpolat422 Před rokem

    Thanks Kai, all in one, some slides include so much information!

  • @vyasshashikant
    @vyasshashikant Před rokem

    J

  • @readthefuckingmanual

    awesome

  • @mainframeconcepts
    @mainframeconcepts Před rokem

    Kai, Thank you for this presentation. You did a good job in presenting the mainframe as it is. I'm surprised there's one organization that ran their tests on a Raspeberry Pi. Thank you again. Keep up the good work

  • @sukumard
    @sukumard Před rokem

    Thanks for this Kai, very very useful to explain to a 175 year old company why they need to change their ways of working to enter the new Digital way of doing things.

  • @unagarjuna8846
    @unagarjuna8846 Před 2 lety

    hi great work and how will i connect mqtt to cassandra db witout kafka

    •  Před 2 lety

      If you don't want to use Kafka, then you need another integration framework. For instance, Apache Camel has broad adoption as open source integration framework. Here is a good post to understand the differences to Kafka: www.kai-waehner.de/blog/2022/01/28/when-to-use-apache-camel-vs-apache-kafka-for-etl-application-integration-event-streaming/

  • @JCArtuso
    @JCArtuso Před 2 lety

    It's amazing Kai! Thanks for sharing. You got read my thinking and put on this video. I'm considering to use Aws MSK for this, because it treat about a managed service and I can use a connector to Salesforce CDC and SAP ODP. For sure I'll follow next videos. Congratulations again!

    • @JCArtuso
      @JCArtuso Před 2 lety

      The Ksqldb changed the game!

    •  Před 2 lety

      Make sure to evaluate the alternatives in detail. Many Kafka cloud vendors oversell a lot. For instance, Amazon MSK is not a truly fully managed service. It provisions brokers; you (have to) take over a lot of the operations. And connectors like Salesforce CDC are not available as part of MSK. Also, Amazon MSK excludes (!) Kafka support in their terms & conditions! Here is a comparison I wrote (but make sure to read different comparisons and make your own evaluation): www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/

    • @JCArtuso
      @JCArtuso Před 2 lety

      @ Thanks! I'll see it!

  • @yogeshjain516
    @yogeshjain516 Před 2 lety

    Thanks for the Part one. Want to see part two and three but are hidden !!

    •  Před 2 lety

      The second and third ones are uploaded in the next two days… 👍🏼

  • @casmeiron
    @casmeiron Před 2 lety

    Hi Kai, The MQTT Proxy approach is possible using Kafka on Confluent Cloud?

    •  Před 2 lety

      Yes. Though, it is not fully managed (yet). You need to deploy it by yourself right now.

  • @harikrishnanvr1557
    @harikrishnanvr1557 Před 2 lety

    kafka consumer does not print the subscribe data, my connector status is running and as you said im using only mqtt. and not mqtt.temperature. Can you help ?

    •  Před 2 lety

      The demo is pretty old. I think the issue with "mqtt.temperature" was fixed a year later or so. Maybe try that.

    • @harikrishnanvr1557
      @harikrishnanvr1557 Před 2 lety

      @ Actually it works with the latest version. One more query , if we self host also should we use the "confluent.license" key from confluent or it is not needed ?

    •  Před 2 lety

      The license key is needed :-)

  • @SankarJankoti
    @SankarJankoti Před 2 lety

    If we use kafka connect, then we dot need mq?

    •  Před 2 lety

      It depends on what you connect to then. MQ is often the easiest integration point to mainframe applications. If you can access the file system, DB2, or Cobol apps (e. g. via REST/HTTP) directly, then you don‘t need MQ in the middle.

    • @SankarJankoti
      @SankarJankoti Před 2 lety

      @ Thanks for responding. I have a requirement where I need to send db2 data from mainframe to db2 on Linux for a open shift java app on a regular basis and vice versa.tablr layouts is different, means I need to do some transformation. Can you pls advise

    •  Před 2 lety

      In that case, the Kafka Connect database connector is perfect. For transformations, you can evaluate the pros and cons of Kafka Connect SMTs (single message transform embedded into the connector) vs separate streaming ETL with Kafka Streams or ksqlDB. No matter what you choose, the whole end-to-end workload stays within the Kafka ecosystem (scalable, reliable, guaranteed ordering, etc.).

    • @SankarJankoti
      @SankarJankoti Před 2 lety

      @ thank you

  • @vishnu-mk
    @vishnu-mk Před 2 lety

    Good discussion around Edge and Steaming..

  • @YigitMesci
    @YigitMesci Před 2 lety

    Simple and effective ! Thanks

  • @vishnu-mk
    @vishnu-mk Před 2 lety

    Great insights..thanks for sharing

  • @kosmologic6007
    @kosmologic6007 Před 2 lety

    Great explanation, thanks! I like especially database out with Kafka as a writer pipe and storage in different dbs (depending on the purpose).

  • @yasirkaram
    @yasirkaram Před 2 lety

    What are best Kafka configurations and setup for 40mb/s load?

    •  Před 2 lety

      In general, Kafka configurations depend on the workloads, message size, latency requirements, network, etc. But for 40mb/s, the default setup with a three-broker cluster is sufficient and no fine-tuning required.

  • @the_iurlix
    @the_iurlix Před 2 lety

    Stunningly clear!

  • @girishb4962
    @girishb4962 Před 2 lety

    Appreciate the information that you provide in your videos. I believe this and PLC4X has huge potential for getting a connected industry achievable

  • @elodiechaumet-doucet1045

    Super! Got a middleware past and need to understand the concept of Events integration and how it could be frenemy with existing middleware.

  • @MannyBernabe
    @MannyBernabe Před 2 lety

    Solid overview of IoT and IIoT.

  • @MannyBernabe
    @MannyBernabe Před 2 lety

    Great presentation. I like the context you provide in advance of the tech.

  • @amirovic2128
    @amirovic2128 Před 2 lety

    Resect bro

  • @vnagaravi
    @vnagaravi Před 2 lety

    Thanks for the video's But can you organise all videos into a platter playlist so it's easy to start from the beginning rather then Going from different videos don't knowing where to start

  • @JohnTube2K
    @JohnTube2K Před 2 lety

    subscribed!!

  • @JohnTube2K
    @JohnTube2K Před 2 lety

    Nice video. This is what I come across as an EA at my job, there is already a sunk cost in legacy technology and it’s a struggle to get business to update their technology unless there is a true business or technical need.