Distributed Systems 5.2: Quorums
Vložit
- čas přidán 27. 07. 2024
- Accompanying lecture notes: www.cl.cam.ac.uk/teaching/212...
Full lecture series: • Distributed Systems le...
This video is part of an 8-lecture series on distributed systems, given as part of the undergraduate computer science course at the University of Cambridge. It is preceded by an 8-lecture course on concurrent systems for which videos are not publicly available, but slides can be found on the course web page: www.cl.cam.ac.uk/teaching/212...
absolutely gold, got to understand the whole quorum thing in less than 10 minutes ! thanks Martin!
This is the author of the famous book Designing Data-Intensive Applications (DDIA)!
7:18 that diagram is so simple yet helped me understand the r + w > n formula *way* better. 👏
Probability of faults
Read-after-write consistency [2:06]
Quorum (2 out of 3) [3:58]
Read and write quorums [5:41]
r + w > n
Read repair [8:19]
Your are such a legend, thank you so much for this video. I went through my lecture notes continuously, but none of it even remotely came close to how well you explained it. Honestly, couldn't thank you enough and that example was brilliant.
Absolutely pleasure watching it. Nice way of presenting it. Thank you Martin !!
Much appreciated. Thank you! You made this incredibly simple to understand
Pleasantly clear and simple explanation. Thank you!
Beautiful explanation, thank you so much!
Beautiful and simple explanation!
Perfect! We love you Martin!
thanks a lot, Martin for the clear explanation.
Excellent explanation
very good and well defined explanation. Thanks!
The explanation is so good!
Thank you so much for the video, I found it very helpful in understanding the Sloppy Quorum!
Very good explanation! 🙌
great explanation thank you
Thank you. Very good explained!
Thank you, Professor
Vielen Dank fuer die tolle Erklaerung!
Great explanation
Thanks Martin!! 🙏🏼
Well done !
great content !! thanks a lot man!!
bro you rocked it
Good job Martin, thanks for shedding light on this topic!
thank you
Great
at 4:30 what if one of the acknowledgement also fails. We could have another node read the responses and use it, but the initial write doesn't know it's successful, would it write again? it could potentially overwrite the other nodes' work if it does that
dude i died, but thnx for explaining i might pass my exams now
Question: what if when the quorum write 1 out of 3 succeeded, but 2 failed to write, how can we rollback the one that write ok? Because we already return the client that write if failed.
good question, can someone ans this question?
In the simple case, there is no rollback - you end up with an incomplete write. If a write succeeds, you know that the value is stored on a quorum of replicas, but if a write fails, you don't know how many replicas it's stored on (it could be one or even more). If this is unacceptable, and you want to ensure that a write is either written to several replicas or not written at all, you need a more sophisticated protocol such as 2-phase commit, which is discussed in lecture 7.1 of this series.
@@kleppmann If there is no rollback and for example I have 3 replicas, when reading from 2 replicas I can get an erroneous version (the one that did not reach the write quorum) and assume it is correct. In the case of using 2PC, is it possible to use a version without a coordinator to keep the system interacting directly between clients and replicas?
Thank you.
- 10 egyptian procrastinators the night before the exam
How following problems are solved in replication?
Question 1. Suppose there are 3 nodes.
Variable 'x' has initial value of 0 on all the nodes.
Client triggers operation 'x = 1'.
The operation is successfull on node 1 but fails on nodes 2 and 3 (due to unrelated issue e.g. broken network).
Hence the update operation as a whole has failed but the data on node 1 has changed.
This will lead to inconsist reads - If the quorum includes node 1 then the read(x) will return 1, otherwise read(x) will return 0.
Question 2. In case of multiple variables that form an invariant, one variable might be updated successfully while the other one encounters an issue.
Answer to Question 1:
In leaderless replications - just use Quorums for reading and writing.
formula: r + w > n
r - denotes the minimum number of replicas needed when a client makes a “read request” from n replicas(parallel)
and waits for at least r replicas successfully responds.
w - denotes the minimum number of replicas needed when a client makes a “write request” to n replicas(parallel)
and waits for at least w replicas successfully responds.
Let's take r=w=2
n = 3
w = 2
r = 2
“Client triggers operation 'x = 1'.
The operation is successful on node 1 but fails on nodes 2 and 3 (due to unrelated issues e.g. broken network).”
From the above statement, the successful write request count is 1 (on node 1), so 1 is not greater or equal to w(which is 2).
According to the write quorum number(w), at least 2 nodes where the write operation is successfully executed are required. So the client's write request will return an error.
General rule:
'If fewer than the required w or r nodes are available, writes or reads return an error. A node could be unavailable for many reasons: because the node is down (crashed, powered down), due to an error executing the operation (can’t write because the disk is full), due to a network interruption between the client and the node, or for any number of other reasons. We only care whether the node returned a successful response and don’t need to distinguish between different kinds of fault.'
Are there any real world examples of such data replication exist? It multiplies number of your datastores that you should pay for :c
Cassandra, for example. My guess is that any distributed database with leaderless replication
@@4am4i exactly, Cassandra is a typical example. Why did you call it leaderless replication? By that you mean multiple-leader distributed database?
@@TruongHoang-du9if Leaderless means write can be done on any replica
Quorum is Trustful therefore incompatible with public blockchain right.
I came up with that conclusion open to discussions.
Yes if you have a bad actor as a node in these examples it can take out the entire system, by generating bad timestamps for example.
Some blockchains are built on the idea of quorums. In order to tolerate some nodes being malicious, they typically use bigger quorums - for example, they might require acknowledgements from more than two thirds of replicas, rather than more than half.
thank you