From Zero to Hero with Kafka Connect
Vložit
- čas přidán 7. 08. 2024
- Integrating Apache Kafka with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren't working.
This talk covers:
* key design concepts within Kafka Connect
* Deployment modes
* Live demo
* Diagnosing and resolving common issues encountered with Kafka Connect.
* Single Message Transforms
* Deployment of Kafka Connect in containers.
📔 Slides: rmoff.dev/kafka-connect-zero-...
👾 Code: github.com/confluentinc/demo-...
⏱ Time codes:
00:00 What is Kafka Connect?
03:38 Demo streaming data from MySQL into Elasticsearch
11:43 Configuring Kafka Connect
12:33 👉 Connector plugins
13:33 👉 Converters
13:53 👉 Serialisation and Schemas (Avro, Protobuf, JSON Schema)
17:13 👉 Single Message Transforms
19:43 👉 Confluent Hub
19:51 Running Kafka Connect
20:24 👉 Connectors and Tasks
21:29 👉 Workers
21:56 👉 Standalone Worker
22:50 👉 Distributed Worker
23:10 👉 Scaling Kafka Connect
24:42 Kafka Connect on Docker
26:17 Troubleshooting Kafka Connect
27:56 👉 Dynamic Log levels in Kafka Connect
28:48 👉 Error handling and Dead Letter Queues
32:16 Monitoring Kafka Connect
32:59 Recap & Resources
--
☁️ Confluent Cloud ☁️
Confluent Cloud is a managed Apache Kafka and Confluent Platform service. It scales to zero and lets you get started with Apache Kafka at the click of a mouse. You can signup at confluent.cloud/signup?... and use code 60DEVADV for $60 towards your bill (small print: www.confluent.io/confluent-cl...)
--
💾Download Confluent Platform: www.confluent.io/download/?ut...
📺 Kafka Connect connector deep-dives: • Kafka Connect
✍️Kafka Connect documentation: docs.confluent.io/current/con...
🧩Confluent Hub: www.confluent.io/hub/?... - Věda a technologie
Hi Robin,
I never write comments on youtube videos, but i deeply want to thankyou for all your work !
Thanks - glad it was useful!
Your examples are always very well chosen. Thanks.
Thanks - glad you've found it useful :)
Hi Robin,
I am a software engineer at a startup. Last year we build a pipeline to sync our postgres data to elasticsearch and cassandra. It was all custom java code with lot of operational handling. Thank you for this video, I am planning to use connect for those pipelines.
Hai Robin, I am a new subscriber fan here
Thanks Robin - from your newest fan and subscriber :) I'm really loving all the information coming from Confluent. Doing a top job. We are getting serious about implementing a solution centralized on Kafka (on limited budget) - guess there is just a lot of different ways and means. Will post on the community bit later - but just wondering - off top of your head if you were combining web logs from multiple websites of a similar nature (db schema is same - although as per your suggestion will look into avro) - would you combine all users into 1 topic (perhaps tagging where they originated) or set-up a topic for each website. Ultimately queries are centralized on username, so origination just fyi. Somewhere I heard/read about creating a topic per user - but this did n't seem right (for 10ks of users)
Hi Mark, from what you describe I would definitely collate these into a single topic, since they sound like the same logical entity. One topic per user sounds…unusual.
Thank you Robin!
my pleasure, glad to help :)
@@rmoff can you share links of use of kafka connect in production by companies. Need these examples to propose connect in my organization
@@rum81 If you look at past talks from Kafka Summit (www.kafka-summit.org/past-events) you'll find lots of examples of companies using Kafka Connect in production.
Hi Robin,
Thanks for amazing videos.We are implementing Kafka in our project and when ever I got stuck your videos are helping a lot to clear of the concepts and issues.
I have small conceptual doubt.
Does Kafka and Kafka connect supports ENUM datatypes . We are facing error like Type cast the data type when syncing data from source table to sync table .
I'm so glad my videos have helped you out :)
I don't know the answer to your ENUM question - please ask at forum.confluent.io/ and someone should be able to help. Thanks.
Hi Robin, thanks for this video. I wonder 'mariadb-jdbc-connect' is available in this project. Thanks :)
Hi, if it has a JDBC driver then it's worth trying with the JDBC Source connector, sure.
Thanks Robin. I have question on Plugin_path. you have given while installing the connector. From where that path came? Can i give any path? Where i can find that path to mention in Dockerfile?
Hi, this path comes from wherever you put the JDBC connector when you installed it. This might help: rmoff.net/2020/06/19/how-to-install-connector-plugins-in-kafka-connect/
If you're still stuck then please go to forum.confluent.io/ and ask for further help there. Thanks.
Hi Robin, is there a source connector for adobe or can we use a json connector as long as the streaming data is in json format?
The best place to ask is www.confluent.io/en-gb/community/ask-the-community/
Hi Robin, I facing issue in creating topic in Kafka for decimal data type is store as byte any way to slove that
Hi Ankit, the best place to ask is confluent.io/community/ask-the-community/
In distributed mode, somtimes connect worker throws error about status.storage.topic cleanup.policy should be set to compact. I'm wondering why it throws that error occasionally!? and...Would setting log.cleanup.policy to compact on Kafka broker fix the issue!?
Yes, they should be set to compact - see docs.confluent.io/kafka-connectors/self-managed/userguide.html#kconnect-internal-topics
Also head to confluent.io/community/ask-the-community if you have any more questions :)
Hey Robin, thanks for this video. But could u pls guide us first on how to start apache kafka connect? And how to check if it is already running.
You can find good info on running Kafka Connect here: docs.confluent.io/platform/current/connect/userguide.html#connect-userguide-standalone-config
@@rmoff I am trying to test FileStreamSourceConnector (file-source , a preconfigured connector in apache kafka)........ Connector starts successfully and it also fetches the data into the topic..... but when I run kafka consumer , it does not fetches any record...... i am following this document docs.confluent.io/platform/current/connect/quickstart.html
Also, I am unable to find such connector under plugin.path...... then how come connctor starts ?
Hello Robin, I connected azureSQL with kafkaconnect by giving table name,host name,server name. .But not able to specify the db schema name anywhere, is there any way to specify schema name? Because without specifying schema name it is creating new table in db.
hi, please head over to forum.confluent.io/ and ask there :) thanks.
The key format 'AVRO' is not currently supported. - when using FOEMAT='AVRO' in the KSQL
You need to upgrade to a more recent version of ksqlDB.
I'm getting
ERROR 1049 (42000): Unknown database 'demo'
while trying to connect to mysql...
Did you create the database first? If you're still stuck head to forum.confluent.io/ with full details of what you've run and where you're getting the error.
Can you share any documents for msk as sink connectors
hi, the best place to get help is at www.confluent.io/en-gb/community/ask-the-community/ :)
I hope it isn't too late to thank you Robin
Glad it was useful :)
Hi Robin,
How can we include the json schema in the message, when field is an array of objects ? I don't have the option to use avro.
Hi, can you post this at forum.confluent.io/ and hopefully someone will be able to help there :)