![Kai Wähner](/img/default-banner.jpg)
- 54
- 411 690
Kai Wähner
Registrace 7. 08. 2006
The Shift Left Architecture
Data integration is a hard challenge in every enterprise. Batch processing and Reverse ETL are common practices in a data warehouse, data lake or lakehouse. Data inconsistency, high compute cost, and stale information are the consequences.
This video introduces a new design pattern to solve these problems: The Shift Left Architecture enables a data mesh with real-time data products to unify transactional and analytical workloads with Apache Kafka, Flink and Iceberg.
Consistent information is handled with streaming processing or ingested into Snowflake, Databricks, Google BigQuery, or any other analytics or AI platform to increase flexibility, reduce cost and enable a data-driven company culture with faster time-to-market building innovative software applications.
Table of Contents:
00:26 - Data Products Business Value
02:14 - Batch ETL and ELT in the Lakehouse
04:36 - Shift Left Architecture
07:24 - Shift Left with Kafka, Flink and Iceberg
More details about the Shift Left Architecture:
www.kai-waehner.de/blog/2024/06/15/the-shift-left-architecture-from-batch-and-lakehouse-to-real-time-data-products-with-data-streaming/
This video introduces a new design pattern to solve these problems: The Shift Left Architecture enables a data mesh with real-time data products to unify transactional and analytical workloads with Apache Kafka, Flink and Iceberg.
Consistent information is handled with streaming processing or ingested into Snowflake, Databricks, Google BigQuery, or any other analytics or AI platform to increase flexibility, reduce cost and enable a data-driven company culture with faster time-to-market building innovative software applications.
Table of Contents:
00:26 - Data Products Business Value
02:14 - Batch ETL and ELT in the Lakehouse
04:36 - Shift Left Architecture
07:24 - Shift Left with Kafka, Flink and Iceberg
More details about the Shift Left Architecture:
www.kai-waehner.de/blog/2024/06/15/the-shift-left-architecture-from-batch-and-lakehouse-to-real-time-data-products-with-data-streaming/
zhlédnutí: 331
Video
When NOT to use Apache Kafka?
zhlédnutí 1,8KPřed 4 měsíci
Apache Kafka is the de facto standard for event streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does Kafka have? (no matter if you use the open source framework or a cloud-native data streaming platform or a serverless SaaS cloud service) When does Kafka simply...
GenAI Demo with Apache Kafka, Flink, LangChain, OpenAI
zhlédnutí 3,4KPřed 6 měsíci
Generative AI (GenAI) enables automation and innovation across industries. This live demo explores a simple but powerful architecture and demo for the combination of LangChain with OpenAI LLM, Apache Kafka for event streaming and data integration, and Apache Flink for stream processing. The use case demonstrates how data streaming and GenAI help correlating data from Salesforce CRM, searching f...
Apache Kafka vs. JMS Message Broker (IBM MQ, TIBCO, Solace)
zhlédnutí 6KPřed rokem
Comparing JMS-based message queue infrastructures and Apache Kafka-based data streaming is a widespread topic. Unfortunately, the battle is an apple-to-orange comparison that often includes misinformation and FUD from vendors. This video explores the differences, trade-offs, and architectures of JMS message brokers and Kafka deployments. Learn how to choose between JMS message brokers like IBM ...
A Hybrid Cloud-native Lakehouse Project for Predictive Maintenance
zhlédnutí 672Před 2 lety
Building Cloud-native Data Warehouses and Data Lakes with Data Streaming - Part 3: A Hybrid Cloud-native Lakehouse Project for Predictive Maintenance. The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Storing data at rest for reporting and analytics requires different capabilities and SLAs than continuously proces...
What is a Lakehouse? Data Streaming and Batch Analytics.
zhlédnutí 1,5KPřed 2 lety
Building Cloud-native Data Warehouses and Data Lakes with Data Streaming - Part 2: Data Lakehouse for Streaming and Analytics. The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Storing data at rest for reporting and analytics requires different capabilities and SLAs than continuously processing data in motion for ...
Data Analytics at Rest vs. Data Streaming in Motion
zhlédnutí 1,5KPřed 2 lety
Building Cloud-native Data Warehouses and Data Lakes with Data Streaming - Part 1: Data Analytics at Rest vs. Data Streaming in Motion. The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Storing data at rest for reporting and analytics requires different capabilities and SLAs than continuously processing data in mo...
Apache Kafka vs. iPaaS ETL Middleware
zhlédnutí 1,6KPřed 2 lety
Enterprise integration is more challenging than ever before. The IT evolution requires the integration of more and more technologies. Applications are deployed across the edge, hybrid, and multi-cloud architectures. Traditional middleware such as MQ, ETL, ESB does not scale well enough or only processes data in batch instead of real-time. This video explores why Apache Kafka is the new black fo...
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
zhlédnutí 2,1KPřed 2 lety
The manufacturing industry is moving away from just selling machinery, devices, and other hardware. Software and services increase revenue and margins. Equipment-as-a-Service (EaaS) even outsources the maintenance to the vendor. This paradigm shift is only possible with reliable and scalable real-time data processing leveraging an event streaming platform such as Apache Kafka. This talk explore...
Kappa vs Lambda Architectures and Technology Comparison
zhlédnutí 9KPřed 2 lety
Real-time data beats slow data. That’s true for almost every use case. Nevertheless, enterprise architects build new infrastructures with the Lambda architecture that includes separate batch and real-time layers. This video explores why a single real-time pipeline, called Kappa architecture, is the better fit for many enterprise architectures. Real-world examples from companies such as Disney, ...
Cloud-Native 5G, MEC and OSS/BSS/OTT Telco with Apache Kafka and Kubernetes
zhlédnutí 6KPřed 2 lety
Apache Kafka and Kubernetes in the Telco Industry - Cloud-Native 5G, MEC and OSS/BSS/OTT Real-World Use Cases. This session explores architectures and use cases for event streaming with the open-source framework Apache Kafka in the Telco industry. Telcos modernize their edge and hybrid cloud infrastructure with Kafka and Kubernetes to provide an elastic scalable real-time infrastructure for hig...
Panel Discussion about Kafka, Edge, Networks and 5G in Oil & Gas and Mining Industry
zhlédnutí 506Před 2 lety
The oil & gas and mining industries require edge computing for low latency and zero trust use cases. Most IT architectures are hybrid with big data analytics in the cloud and safety-critical data processing in disconnected and often air-gapped environments. This panel discussion explores the challenges, use cases, and hardware/software/network technologies to reduce cost and innovate. A key foc...
Apache Kafka in the Insurance Industry
zhlédnutí 972Před 2 lety
The rise of data in motion in the insurance industry is visible across all lines of business including life, healthcare, travel, vehicle, and others. Apache Kafka changes how enterprises rethink data. This talk post explores use cases and architectures for insurance-related event streaming. Real-world examples from Generali, Centene, Humana, and Tesla show innovative data integration and stream...
Apache Kafka in the Retail Industry
zhlédnutí 1,6KPřed 2 lety
Use cases, architectures, and real-world deployments of Apache Kafka in edge, hybrid, and global retail deployments at companies such as Walmart and Target. The retail industry is completely changing these days. Consequently, traditional players have to disrupt their business to stay competitive. New business models, great customer experience, and automated real-time supply chain processes are ...
Apache Kafka in the Automotive Industry
zhlédnutí 2,1KPřed 3 lety
Data in Motion powered by Apache Kafka in the Automotive Industry (Connected Vehicles, Manufacturing 4.0, Mobility Services, Smart City). Slides for this talk: www.slideshare.net/KaiWaehner/apache-kafka-in-the-automotive-industry-connected-vehicles-manufacturing-40-mobility-services-smart-city Connect all the things: An intro to event streaming for the automotive industry including connected ca...
Apache Kafka for Oil & Gas, Smart Grid, and Energy Utilities
zhlédnutí 1,8KPřed 3 lety
Apache Kafka for Oil & Gas, Smart Grid, and Energy Utilities
Hybrid Cloud Replication with Apache Kafka between Edge and Cloud - Live Demo
zhlédnutí 1,4KPřed 3 lety
Hybrid Cloud Replication with Apache Kafka between Edge and Cloud - Live Demo
App Modernization and Hybrid Cloud Architectures with Apache Kafka
zhlédnutí 1,3KPřed 3 lety
App Modernization and Hybrid Cloud Architectures with Apache Kafka
Augmented Reality Demo with Apache Kafka
zhlédnutí 681Před 3 lety
Augmented Reality Demo with Apache Kafka
Supply Chain Optimization with Apache Kafka
zhlédnutí 1,6KPřed 3 lety
Supply Chain Optimization with Apache Kafka
Apache Kafka in Manufacturing and Industry 4.0
zhlédnutí 2,9KPřed 3 lety
Apache Kafka in Manufacturing and Industry 4.0
Machine Learning with Apache Kafka without another Data Lake
zhlédnutí 1,1KPřed 3 lety
Machine Learning with Apache Kafka without another Data Lake
End to End Integration from IoT Edge to Serverless Apache Kafka in Confluent Cloud
zhlédnutí 1,7KPřed 3 lety
End to End Integration from IoT Edge to Serverless Apache Kafka in Confluent Cloud
IoT Architectures + Use Cases for Apache Kafka - Consumer and Industrial IoT (IIoT / Industry 4.0)
zhlédnutí 7KPřed 4 lety
IoT Architectures Use Cases for Apache Kafka - Consumer and Industrial IoT (IIoT / Industry 4.0)
Mainframe Integration, Offloading and Replacement with Apache Kafka
zhlédnutí 3,9KPřed 4 lety
Mainframe Integration, Offloading and Replacement with Apache Kafka
Apache Kafka and Machine Learning in Pharma and Life Sciences
zhlédnutí 1,3KPřed 4 lety
Apache Kafka and Machine Learning in Pharma and Life Sciences
Apache Kafka and Machine Learning in Banking and Finance Industry
zhlédnutí 3,5KPřed 4 lety
Apache Kafka and Machine Learning in Banking and Finance Industry
IoT Architectures for a Digital Twin with Apache Kafka and Comparison to other IoT Platforms
zhlédnutí 24KPřed 4 lety
IoT Architectures for a Digital Twin with Apache Kafka and Comparison to other IoT Platforms
Event Streaming and Apache Kafka in the Telecommunications Industry (Whiteboard)
zhlédnutí 2,2KPřed 4 lety
Event Streaming and Apache Kafka in the Telecommunications Industry (Whiteboard)
I liked that. Quite informative.
This is a great demo, Thanks Kai for showing how the entire workflow produces the results
Interesting and inspiring example. Thank you.
great
Can we stop saying things are easy to set up?
Complete biased comparison, not even in one characteristic is comparing purely one broker with another. The real answer to this comparison is that actually it depends on the use case, if you dont mind to have some messages without being processed go to Streaming messaging that performs awesomly. BUT if you need an Enterprise grade guarantee of the delivery of all the messages, JMS architecture is unbeatable
I don't disagree. If you just need enterprise messaging, then JMS is great (if the message broker can handle the scale you need and licensing cost makes sense). Kafka and JMS usually service different use cases, though you can also leverage Kafka for messaging (but not the other way round). A more detailed comparison based on 10 characterstics is here (but still similar approach as the presentation): www.kai-waehner.de/blog/2022/05/12/comparison-jms-api-message-broker-mq-vs-apache-kafka/
Is every software, jars and plugins free to use commercial purpose, what ever used in this video to make a connection between mqtt broker to kafka to postgres database?
Frankly, I don‘t remember after 5 years. But there are definitely free open source components available for the entire pipeline! Search for open source Kafka Connect connectors.
🙃 Promo-SM
I was looking for an example like this. Thank you so much. Greetins from Brazil.
I am looking for 5G Core integration and confiuration doecs and files. such as Integration and cinfiguration a new node in the 5g core like (CSCF, MGC,IMF,UPF,SMF; ..... ) I appreciate if you can share these documens and integration files
I don't think something like this exists. Super complex and every CSP does its own thing. And keeps the solutions confidential.
Good content....thanx bro.. i am new learner of kafka
What do you recommend, KAFKA or ESB in the context of Integrated Manufacturing Operations Management Systems (MOMS) in the oil and gas industry (Refinery)
It depends more on the use case and technical requirements, not so much on the industry. I have a dedicated article (and video) exploring the differences between ESB and Kafka: www.kai-waehner.de/blog/2019/03/07/apache-kafka-middleware-mq-etl-esb-comparison/
top content!!
Great architecture good keep it up
On the less contraversial... I came here to learn and get ideas before moving my tiny little MQTT docker micro-service stacks into K8S from docker-compose. As Java is my day job, I use python there :) However, the lure of Kafka in the same K8S cluster is appealing. Now you have got me started on AI! I use a bespoke, dynamic workflow model of "published interests" for heating control. Those published interests are based on fixed, static targets and boolean presence/scheduling logic. Training an AI model on the data to begin to better predict when the heating WILL be required, rather than relying on "rule engine based" triggers for after the heating is required. I have a long way to go to learn how to model that, but I figure I would start by replaying all the grafana data I have for a few winters and look for instances when the retro-active present system was "late" in that the required temperature was not met for X amount of time. It would then have classifications on when the system did well and when it did not do well. It should then be pivot-able to answer, "should the heating be on now?". Based on it's previous patterns of when the heating has been on but should have been on earlier, it may (the AI portion) then be allowed to over-rule the rules and turn the heating on early. The data from it doing so can then be fed back as well. Trouble is, all that sounds good, but I have no idea how to do it :)
@louisrossmann would love this. It explains clearly the answer to his question, "What the hell is my car doing recording my sexual activity?". Unfortunately the real answer is because, it's not the car, it's the cloud it sends every single event message to that records the periodic bouncing of the suspension and the AI feeding off it identified it as sexual activity. If they don't put it in their privacy policy they get done for recording that information. It still makes any car with this facility something I will never own. Hopefully I can make it for another 30 years or motoring without having a "dumb car" which requires a permenant internet connection to the cloud to function properly. No thanks. If my car works today, I would like it to work tomorrow. I do not want my car to shutdown mid motorway because a Kafka queue has to reparition and rebalance and the cluster crashes.
This recording is a lot more than a simple compare of technologies, ... thanks for all the real-world project insides!
I want to setup this project in windows subsystem for linux i am getting error in "confluent local start connect" command could you please help me??
Great explanation! Thanks
Thank you for your videos. Really helpful way to be introduced to those new (for me) approaches. I am a big fan of visual representations
Is Solace good for this hybrid use case?
As alwasy, it depends (but the short answer is that messaging alone is usually not sufficient for a reliable and scalable data synchronization across hybrid and cloud environments). You might check the following video about "message broker vs. data streaming" to better understand the difference between a message broker like Solace and a data streaming platform like Confluent: czcams.com/video/VA3NR5s-AQQ/video.html
It's one of the best apple to orange comparison ever witnessed.
I thoroughly enjoyed your 3 part series on data streaming and its role in building data warehouses. Thanks! Also, it would be great if you could attach in the description the links to the blogs/images that you used in the presentation.
I loved thanks for explaining this very simple way
Such a great point that other industries system need to work for decades as compared to the virtually fly-by-night operations of many of today's "apps". Software is just not made to work for the long run. With the exception of Java enterprise applications, in my opinion. :) It's like the saying "they don't make them like they used to". But maybe we've come full circle. Great video. Thank you
Thanks Kai, all in one, some slides include so much information!
J
awesome
Kai, Thank you for this presentation. You did a good job in presenting the mainframe as it is. I'm surprised there's one organization that ran their tests on a Raspeberry Pi. Thank you again. Keep up the good work
Thanks for this Kai, very very useful to explain to a 175 year old company why they need to change their ways of working to enter the new Digital way of doing things.
hi great work and how will i connect mqtt to cassandra db witout kafka
If you don't want to use Kafka, then you need another integration framework. For instance, Apache Camel has broad adoption as open source integration framework. Here is a good post to understand the differences to Kafka: www.kai-waehner.de/blog/2022/01/28/when-to-use-apache-camel-vs-apache-kafka-for-etl-application-integration-event-streaming/
It's amazing Kai! Thanks for sharing. You got read my thinking and put on this video. I'm considering to use Aws MSK for this, because it treat about a managed service and I can use a connector to Salesforce CDC and SAP ODP. For sure I'll follow next videos. Congratulations again!
The Ksqldb changed the game!
Make sure to evaluate the alternatives in detail. Many Kafka cloud vendors oversell a lot. For instance, Amazon MSK is not a truly fully managed service. It provisions brokers; you (have to) take over a lot of the operations. And connectors like Salesforce CDC are not available as part of MSK. Also, Amazon MSK excludes (!) Kafka support in their terms & conditions! Here is a comparison I wrote (but make sure to read different comparisons and make your own evaluation): www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/
@ Thanks! I'll see it!
Thanks for the Part one. Want to see part two and three but are hidden !!
The second and third ones are uploaded in the next two days… 👍🏼
Hi Kai, The MQTT Proxy approach is possible using Kafka on Confluent Cloud?
Yes. Though, it is not fully managed (yet). You need to deploy it by yourself right now.
kafka consumer does not print the subscribe data, my connector status is running and as you said im using only mqtt. and not mqtt.temperature. Can you help ?
The demo is pretty old. I think the issue with "mqtt.temperature" was fixed a year later or so. Maybe try that.
@ Actually it works with the latest version. One more query , if we self host also should we use the "confluent.license" key from confluent or it is not needed ?
The license key is needed :-)
If we use kafka connect, then we dot need mq?
It depends on what you connect to then. MQ is often the easiest integration point to mainframe applications. If you can access the file system, DB2, or Cobol apps (e. g. via REST/HTTP) directly, then you don‘t need MQ in the middle.
@ Thanks for responding. I have a requirement where I need to send db2 data from mainframe to db2 on Linux for a open shift java app on a regular basis and vice versa.tablr layouts is different, means I need to do some transformation. Can you pls advise
In that case, the Kafka Connect database connector is perfect. For transformations, you can evaluate the pros and cons of Kafka Connect SMTs (single message transform embedded into the connector) vs separate streaming ETL with Kafka Streams or ksqlDB. No matter what you choose, the whole end-to-end workload stays within the Kafka ecosystem (scalable, reliable, guaranteed ordering, etc.).
@ thank you
Good discussion around Edge and Steaming..
Simple and effective ! Thanks
Great insights..thanks for sharing
Great explanation, thanks! I like especially database out with Kafka as a writer pipe and storage in different dbs (depending on the purpose).
What are best Kafka configurations and setup for 40mb/s load?
In general, Kafka configurations depend on the workloads, message size, latency requirements, network, etc. But for 40mb/s, the default setup with a three-broker cluster is sufficient and no fine-tuning required.
Stunningly clear!
Appreciate the information that you provide in your videos. I believe this and PLC4X has huge potential for getting a connected industry achievable
Super! Got a middleware past and need to understand the concept of Events integration and how it could be frenemy with existing middleware.
Solid overview of IoT and IIoT.
Great presentation. I like the context you provide in advance of the tech.
Resect bro
Thanks for the video's But can you organise all videos into a platter playlist so it's easy to start from the beginning rather then Going from different videos don't knowing where to start
subscribed!!
Nice video. This is what I come across as an EA at my job, there is already a sunk cost in legacy technology and it’s a struggle to get business to update their technology unless there is a true business or technical need.