Kafka Tutorial - Fault Tolerance
Vložit
- čas přidán 1. 12. 2016
- Spark Programming and Azure Databricks ILT Master Class by Prashant Kumar Pandey - Fill out the google form for Course inquiry.
forms.gle/Nxk8dQUPq4o4XsA47
-------------------------------------------------------------------
Data Engineering using is one of the highest-paid jobs of today.
It is going to remain in the top IT skills forever.
Are you in database development, data warehousing, ETL tools, data analysis, SQL, PL/QL development?
I have a well-crafted success path for you.
I will help you get prepared for the data engineer and solution architect role depending on your profile and experience.
We created a course that takes you deep into core data engineering technology and masters it.
If you are a working professional:
1. Aspiring to become a data engineer.
2. Change your career to data engineering.
3. Grow your data engineering career.
4. Get Databricks Spark Certification.
5. Crack the Spark Data Engineering interviews.
ScholarNest is offering a one-stop integrated Learning Path.
The course is open for registration.
The course delivers an example-driven approach and project-based learning.
You will be practicing the skills using MCQ, Coding Exercises, and Capstone Projects.
The course comes with the following integrated services.
1. Technical support and Doubt Clarification
2. Live Project Discussion
3. Resume Building
4. Interview Preparation
5. Mock Interviews
Course Duration: 6 Months
Course Prerequisite: Programming and SQL Knowledge
Target Audience: Working Professionals
Batch start: Registration Started
Fill out the below form for more details and course inquiries.
forms.gle/Nxk8dQUPq4o4XsA47
--------------------------------------------------------------------------
Learn more at www.scholarnest.com/
Best place to learn Data engineering, Bigdata, Apache Spark, Databricks, Apache Kafka, Confluent Cloud, AWS Cloud Computing, Azure Cloud, Google Cloud - Self-paced, Instructor-led, Certification courses, and practice tests.
========================================================
SPARK COURSES
-----------------------------
www.scholarnest.com/courses/s...
www.scholarnest.com/courses/s...
www.scholarnest.com/courses/s...
www.scholarnest.com/courses/s...
www.scholarnest.com/courses/d...
KAFKA COURSES
--------------------------------
www.scholarnest.com/courses/a...
www.scholarnest.com/courses/k...
www.scholarnest.com/courses/s...
AWS CLOUD
------------------------
www.scholarnest.com/courses/a...
www.scholarnest.com/courses/a...
PYTHON
------------------
www.scholarnest.com/courses/p...
========================================
We are also available on the Udemy Platform
Check out the below link for our Courses on Udemy
www.learningjournal.guru/cour...
=======================================
You can also find us on Oreilly Learning
www.oreilly.com/library/view/...
www.oreilly.com/videos/apache...
www.oreilly.com/videos/kafka-...
www.oreilly.com/videos/spark-...
www.oreilly.com/videos/spark-...
www.oreilly.com/videos/apache...
www.oreilly.com/videos/real-t...
www.oreilly.com/videos/real-t...
=========================================
Follow us on Social Media
/ scholarnest
/ scholarnesttechnologies
/ scholarnest
/ scholarnest
github.com/ScholarNest
github.com/learningJournal/
========================================
Want to learn more Big Data Technology courses. You can get lifetime access to our courses on the Udemy platform. Visit the below link for Discounts and Coupon Code.
www.learningjournal.guru/courses/
Your series of explanations on Kafka is by far the best I could find online as of today, thank you
Thank you very much.
Thank you for explaining replication factor! Loving your videos on Kafka! so well explained.
super well explained, good job !! thanks for your videos!!
One of the best Kafka training.. very clear and simple to understand all the series. A BIG thank you to you Sir for preparing this series !!!
absolutely marvellous explanation with proper hands-on. will never forget that replication factor is defined at topic level but leaders are created at partition level... thanks for these amazing videos
Crisp and clear. Even better than paid courses. Thank you @Learning Journal
Absolutely, the best session on Kafka so far. very nicely explained.
You have explained very well. U made it easy for all of us to understand Kafka. Thank you
You are wonderful person and I really appreciated for the this sessions. Excellent details and communication also you are giving what exactly required to learn. I have clicked thumbs up icon but wanted to give more my friend, appreciated.
Awesome tutorial :D Thank you so much for such wonderful explanation sir.
Fantastic explanation. You are making these tough topics so simple to understand.
Superbly clear, well presented and interesting.
Fantastic sir.................very clear.
excellent explanation - just as much as you need ( no more, no less) :)
Really Nicely explained, I don't think there could be more simplified explanation.
Hello sir.. I gone with your all Kafka videos .. very well explain.. thanku
awesome good job, clearly explained
@learning journal , that's a very good video! Thanks!!
Simply explained and to the point✌️👍
Very nice tutorial sir, very clear explanation , keep posting ... sir
well explained . Crispy and clear
Good job in explaining !!
Nice explanation, I really appriciate it. thanks for your videos!!
Great tutorial! Thanks
Excellent!
Fantastic Tutorial Sir
excellent explanation !!!!!
Could you please explain using GCP,there i am not able to Setting up a multi-broker cluster.not finding config.
Nice tutorial, very easy to understand the concept.Thanks for such informative video. Please post video on spark also
One of the best kafka demo. thank you.
Thanks Robin for sharing your feedback.
well explained ... thanxs for this...
Thanks sir.Its helpful
very well explained.. can you add some video with kafka and spark streaming. with deployment strategy on cluster.
How we can see, which partition in which broker within the cluster for a topic, is there any command for it and where can i findout all such commands? And, Is there any UI to manage kafka brokers (To see what all the topics are there, to see all the partitions, To see all the msg's which are currently in the broker for a topic... etc)?
Thank you
Fantastic.. your delivery style is flawless.. no errr... no emmm....
excellent explanation.... expecting more topics on Kafka.. thks
Thanks, Vijendran.... Stay subscribed and you will get more videos for sure.
Can we use consumer-groups on top of two Kafka Clusters, which are replicated using MirrorMaker?
If yes, will Kafka warranty the "Exactly One Time Delivery" policy?
Amzing ! Please keep it up for everyone’s sake. Thank you ! Will you also do other component such as MapReduce, Spark, Pig, Hive :)
Already started Hadoop and Map Reduce. Please check my Hadoop playlist. Others will also follow.
These tutorials are really awesome. Thanks a lot for these.
Can you please resolve my question : how can we stop the instances of the broker one by one ? I was trying, it is telling "No instance(s) available."
Best course on kafka. Can you make a course on Cassandra?
If I have 1 topic with 3 partitions on 3 different machines and also has replication factor 3, so ultimately each partition will be a leader and will also maintain copies of 2 other partitions. This will lead to high space usage on each partition. So shouldn't copies of data be maintained in idle machines, is there way to determine the machines which should maintain the copies and which are not leaders ?
Your explanation made it very simple to understand Kafka. Thank You.
A question - If we start Brokers on multiple systems, do we need to increment the broker id's or '0' is good enough in all brokers?
Broker ID should be unique for each broker in the cluster.
Hi this is a wonderful tutorial... But i have a doubt regarding setting the replication factor... Is there any option to set the replicationfactor for dynamically created topics programmatically.. Please reply..
Very nice tutorial. A lot of concept is cleared.
A quick question though, if i start single broker and keep the num.partitions=1 in server.properties file, and push a message to a non-existent topic. It will create a topic with one partition. Now if i start another broker on other machine with same configuration will the topic gets replicated on the other server?(no replication factor set)
Very nice explanation sir..thanq..One question..If the zookeeper is down for some reason and consumer is in the middle of reading..Is there a way to continue from that offset n partition once it's up?
What are the similarities/differences between Kafka replication factor and HDFS replication (default 3)? If HDFS is used then do we even need replication at Kafka level?
Replication is a backup copy to be used in case of failure. It has the same purpose everywhere. Kafka doesn't run on YARN yet. It stores data on local filesystem instead of HDFS. So you need both of them to replicate their data.
What are the options used these days to copy from Kafka to HDFS? Like "Flume', "Kafka HDFS Connector" (using Confluent Kafka?), any others ?
Could you please tell me how to create multiple brokers on a Kafka VM on Google cloud
In case of multimode cluster ..Do we need to set up ssh or kafka is having any mechanism for communication between nodes?Nice videos ..helping to understand ..
You need to make sure the TCP/IP connectivity. No need to setup ssh.
Hi Sir..You did a great job..Need you help I have an doubt.
Replicas: 1,0,2 means Broker 1 is leader and maintains the first copy and Broker 0 and Broker 2 each maintains one copy as per replication factor(3).
--partitions 2 means 1st partition on broker 1 and 2nd partition on Broker 2. How broker 0 can store a copy without having partition?
We have partition 2(Broker 1+ Broker 3) and replication factor 3( Broker 1+Broker 0+ Broker 2). How Broker 0 storing the copy without having partition.I hope you understand my query . Waiting for your response. Thanks
Hi Sir, its very good. But it looks that Leader node is a single point of failure. How do you address in case of node 1 fails? Because the producers and consumers are connecting to Node1 which is a Leader node.
Can there be possibility of not able to produce an event as the broker is down. How would we handle that
Hi, After changing port no of server-1.properties file, I am getting an error "kafka.common.KafkaException: Socket server failed to bind to 0.0.0.0:9092: Address already in use: bind." please help!
Isn't leader is a single point of failure? How it handles fault tolerance in case if leader fails?
Is there a way to build gui tool to use the tools under /bin folder? Does Kafka offer any api for us to do such a thing?
That's a nice thing to do, but most of it is already done by the confluent team. I suggest that you check the confluent documentation once before you decide to create something.
Nice tutorial. it is really easy to understand. I have one doubt in this video we have created 3 broker . but is --describe command why it is showing 2 rows of data. why not 3 rows of data?
does the size occupied by replications is of the same size as that of original data or is it a compressed format? Also, will all the topics be applied replication in a production system?
Your question indicates that you are concerned for storage space. Having three replica is wildly accepted in almost every distributed system. Storage is getting cheaper and cheaper. Compression is not an issue. You can implement it. But for Kafka, storage space is not a big concern because we cleaanup old messages that we don't need. You can configure cleaning frequency.
What is local Host port given while creating topic ??
I have one question..
Example= I have daily 15 GB data from 4 different resources.. then can you plz tell me that how many producer topics brokers partition and consumers I have to take
Can you make a video on Kafka controllers
Hello Sir, All your lectures are beautifully explained.
I have couple of queries, can you please clarify for me:-You have mentioned, at 2:40min in the video that, when client wants to send data, it connects to the leader. Does Producer keep track of the Partition and Partition leader?
Also, does producer keep track of the Partition Offset?
After connecting to a broker, producer internally queries for the metadata from the broker. The metadata contains all those details.
@@ScholarNest - Thanks for replying.
When you say "producer internally queries for the metadata from the broker.", does this means, Producer queries for the metadata details from Kafka Broker and broker in return fetches those details from Zookeeper?
Yes, but all the metadata may not come from the Zookeeper because in every newer version, Kafka is trying to reduce the dependency on Zookeeper.
sir, if 3 brokers on 3 different machines, do we need to configure IP address also apart from port number?
The default is localhost and that works if you are using three different machines.
Good video, can't we get port clash, if we leave it as is. 7:10
you have created 2 partition & 3 replicas, so i understnad each broker will have 1 replica so (no. of replica = no. of brokers) but i am confused where partition will be created because we have 2 partition and 3 brokers so in which broker which partition will go?...please clear my doubt.
When there are 2 or more partitions for a topic handled by different brokers, where are we mentioning what requests(from producer/consumer) is handled by which broker? How will kafka handle this ?
Hello Suhas, I have just started understanding Kafka and I also have the same query. Did you get an answer to it?
so if we want 2 partitions for a topic, should we create 2 Kafka brokers??(or multiple, 1 for each partition)
I think this question is answered in one of the videos. Finish the tutorial in sequence.
very good explanation, I have a simple question why do we need to provide zookeeper related info while creating Topic why not Kafka server information ?
Topic metadata is kept in Zookeeper. We need a common place for all brokers to have access to some essential information, Kafka uses Zookeeper for that purpose.
For those, who neither have linux nor any access to clouds, can download git bash and change your code to visual studio code. And you can start working on your windows.😎
Very well explained.. how Kafka know first broker(server.properties),second broker(server-1.properties),Third broker(server-2.properties) are in same cluster?
When you start a broker you specify the properties file as a parameter, so Kafka knows it.
Hi Its very good explanation. How to specify the leader? Where to specify?
I had the same question. Turns out that Kafka (actually the ZooKeeper underneath) uses the concept of ISR (In Sync Replicas explained in video) to elect the leader. The ISR is persistent with the Zookeeper so any change in ISR is reflected by Zookeeper. So in layman terms, if ZooKeeper knows that 3 replicas of partitions are to be maintained for a particular topic and all 3 ISR are in perfect order then it just picks one of them (I think randomly) as all are legit leader candidates. In this manner it can also deal with fault-tolerant as all it needs to do is make sure remaining ISR is okay.
For more detail explanation :
community.hortonworks.com/questions/64905/kafka-leader-election.html
kafka.apache.org/documentation/#design_replicatedlog
So if I say if number of partitions is X and replication factor is Y . So total number of partitions at any Moment is X times Y is it ?
Yes. In your case, assuming X=5 and Y=3, You will have 5 partitions and each having 3 copies so in total 15.
Hello Sir, is it possible to read messages stored in Kafka from beginning using logstash without changing the group I'd?
You can get data form beginning as long as it is not cleaned or compacted at Kafka. However, When you work in a group, you don't read everything. I suggest you to rethink about what you are trying to achieve.
@@ScholarNest thank you sir for reply. I achieved it using Kafka command --reset-offsets --to-earliest. It resets the message's offset from beginning and logstash reads it again.
A question about committed partition message offset and fault tolerance - does partition leader recplicate the committed offset to the follower?
Partition Leader? I think that stores messages and followers replicate the message. Offsets are generally stored in a topic so they are like globally available.
A topic is the meaningful category of a bunch of messages. Could you give some explain what do you mean Offsets are generally stored in a topic?
Kafka needs to store it somewhere. In the earlier version, it was being stored in zookeeper. In the new version, they create a topic and store it there.
when we have added two more brokers on existing broker node make sense because we have modified the required information in configuration files but however we have not modified any configuration for zookeeper... So question is how does zookeeper will get know that the two brokers are added newly into the cluster?
Normally you copy a broker configuration file and edit some of the properties. Since zookeeper is same for the new broker as well, we don't change that. But zookeeper details are there in the configuration file.
Thank you... Understood and following other videos and if we get any doubts will post them... awesome videos
How can one implement Kafka to a traditional warehouse space?
Datawarehouse is just a database. Do you see any problem in that?
Sir, How the 3 brokers are coming into same cluster? Are we mentioning it anywhere?
That's a good question but the answer is not that straight. Cluster members are managed in zookeeper. I have explained it in my Kafka Streams training. You might want to get It at Udemy :-)
what is a listener in kafka,how to change listener port number
There is no listener in Kafka. You connect to the cluster by supplying a broker address and port. Start broker on a different port if you want to change.
Please tell me how leader copies all incoming messages to followers?
The leader doesn't copy it to followers. It is the other way around. Followers copy it from the leader.
Is there a way to Handel failure of producer?
Producers are independent application. We treat it like any other independent application. There is no specific method in Kafka to handle those failures. What would you do to handle failures of any other application. Same applies to Kafka producer.
Learning Journal thank you for your reply..
Course is awesome
@Lerning Journal
Question: is it possible particular topic all partitions have the same Leader
example Topic name PDF
Number of Partitions :3
Number of Nodes:3 ( N1, N2, N3)
when we run command describe topic PDF then it possible
Partitions 0 Leader: N1
Partitions 1 Leader: N1
Partitions 2 Leader: N1
Reference of video position time 10:06
Thank you, Sir, Very clean and state forward explanations!
Normally doesn't happen unless other two brokers are down while you create the topic. But possible on a single node cluster.
So does the Producer know the public IP address of all 3 Brokers? How does the Producer know which is the Leader and what makes him swap to the new Leader if it fails? Thats what I don't understand from the video. Ta.
Producer connects using IP address provided and then queries metadata from the broker to get detailed information about other brokers and leader of the topic.
What if we have two brokers and have replication factor as 3?
+Venkatesh Kamthane good question. Try it and let me know the answer.
@@ScholarNest Topic creation will fail with => Replication factor: 3 larger than available brokers
Tutorial was good. Experimenting something I came up with a question. Lets say I have a topic XYZ and three brokers say B1 B2 B3. I have done partition 2 and replicas 1. After checking the details of the topic, B1 and B3 have the topic but not B2.
Now when I have made the producer linked to B2 where the topic is not there and all the consumers have linked to that topic from all three bokers,
I was successful in sending data and well as recieving in all three brokers. How is it happening. I mean topic is not partitoined in B2 but I am able to send the data from there and even B2 is able to take the data
Producer pulls metadata after connecting to the broker. The metadata contains the list of all brokers in the cluster. That's how the producer knows about the all brokers in the cluster.
For sound at the beginning of the video you may loos subscriber
Thank you