Distributed Systems 6.1: Consensus

Sdílet
Vložit
  • čas přidán 9. 07. 2024
  • Accompanying lecture notes: www.cl.cam.ac.uk/teaching/212...
    Full lecture series: • Distributed Systems le...
    This video is part of an 8-lecture series on distributed systems, given as part of the undergraduate computer science course at the University of Cambridge. It is preceded by an 8-lecture course on concurrent systems for which videos are not publicly available, but slides can be found on the course web page: www.cl.cam.ac.uk/teaching/212...

Komentáře • 19

  • @daratha83
    @daratha83 Před 3 lety +75

    I accidently came to this channel and noticed that you are the famous Martin Kleppmann, author of the 'Designing Data Intensive Applications'. Nice to meet you Martin, I really enjoyed reading your book.

    • @RandomShowerThoughts
      @RandomShowerThoughts Před rokem +2

      oh wow, I didn't even realize. I was looking at the book and wanted a video companion

  • @lalitha7777
    @lalitha7777 Před 3 lety

    Very informative and interesting.

  • @zhou7yuan
    @zhou7yuan Před 2 lety +9

    Fault-tolerant total order broadcast [0:15]
    Manual failover [1:15]
    Consensus and total order broadcast [3:07]
    Common consensus algorithms [5:03]
    Paxos: single-value consensus
    Multi-Paxos: generalisation to total order broadcast
    Raft, Viewstamped Replication, Zab: total order broadcast by default
    Consensus system models [5:54]
    partially synchronous, crash-recovery
    Leader election [10:16]
    Ensure 1 leader per turn:
    Term++ / leader election
    1 node vote once / term
    quorum of nodes
    Can we guarantee there is only one leader? [14:07]
    only unique leader per term
    cannot different terms
    Checking if a leader has been voted out [16:20]
    Shall I be your leader in term t? → yes ← yes
    Can we deliver message m next in term t? → ok ← ok
    Right, now deliver the message

  • @giuliosalierno3817
    @giuliosalierno3817 Před rokem +1

    Thank you Prof. Kleppmann for this lecture. I have just one question: Is the second quorum intended to prevent multiple leaders from making the same request of delivering a message? Is this concept applied in the Paxos algorithm through the promise messages sent by acceptor nodes? Thank you

  • @hainguyen9148
    @hainguyen9148 Před rokem +2

    I click the button like before watching the video

  • @za406
    @za406 Před 10 měsíci

    Question re: the second quorum where the master asks for permission to send message M in term T. From what you said, it seems like this double-check is designed to ensure total order even if there are two active leaders at a given time. However, if this second quorum passes but the leader crashes prior to actually delivering M, couldn't we result in a split-brain state again?

  • @bibop224
    @bibop224 Před 2 lety +2

    Is the two generals problem an instance of the consensus problem? We have two "nodes" that want to agree on a value (the date of the attack). Or is there something that makes two generals problem different than a consensus problem as defined in this video?

    • @kleppmann
      @kleppmann  Před 2 lety +18

      There is a difference, but it's very subtle. The way consensus is formally defined, we say (slightly simplified) that consensus is achieved if all non-faulty nodes eventually reach a decision (liveness), and no two nodes make different decisions (safety). The liveness property is predicated on the assumption that any network partition is eventually repaired, and so any two non-faulty nodes will eventually be able to communicate. However, it's acceptable for there to be a period of time during which one node has made a decision and another node has not yet made a decision, and it's okay for this period to last for a long time.
      On the other hand, if one node makes a decision and the other node remains undecided for a long time, that is not considered a valid solution to the two generals problem, because if the undecided node waits until after the attack date to make its decision, that would be equivalent to having decided the wrong date. The requirement that the two generals reach agreement within some time bound makes the two generals problem harder than the consensus problem.

    • @bibop224
      @bibop224 Před 2 lety

      @@kleppmann That makes sense, thank you!

    • @rutvijravi6199
      @rutvijravi6199 Před rokem +1

      @@kleppmann I see another difference in addition to what you have stated. The 2 generals problem also requires common knowledge to be established between the generals on the decided value (time to attack). Whereas in consensus, the leader has to know whether a quorum decided, but the nodes in the quorum (that have decided) don't necessarily need to know (that the leader knows that they decided) before the leader ACKs the client.
      Is my understanding correct?

    • @luismrguimaraes
      @luismrguimaraes Před rokem

      @@rutvijravi6199 This does seem correct.

  • @khacdoi1995
    @khacdoi1995 Před 3 lety

    niceeee

  • @allyourcode
    @allyourcode Před 2 lety +4

    @14:36 A few seconds later, I got video buffering. Sus.

  • @LinhNguyen-bp9hd
    @LinhNguyen-bp9hd Před 2 lety +1

    after watching the series so many times I still cannot understand how the concurrent messages' order will be decided.
    Total order broadcast makes sure that the messages' order will be the same on all nodes, but how can the concurrent messages's order is decided?

    • @vhscampos1
      @vhscampos1 Před 2 lety +2

      My understanding is that, in the case of Raft at least, since all messages are first sent to the leader, which does the sequencing, the order between concurrent messages is defined by the order in which they arrive at the leader. All replicas will deliver them in the same order because the leader sends messages via FIFO broadcast.

    • @chaz-e
      @chaz-e Před 2 lety +1

      broadcasted msgs reach the Master then based on order of msgs received the order is decided/maintained.

    • @tianwenchu1663
      @tianwenchu1663 Před rokem +2

      Firstly, I think during the lecture 4.3, the total order broadcast is actually presented as leaderless design database, a client can send read/write request to some node, and then this node using gossip protocol to send such read/write as message to each other replication nodes. To ensure executing order is same between nodes (if event logic require, as no property as commutative), then they can rely on FIFO total order broadcast to sync, such broadcast does not need a leader, they still uses gossip protocol to flow the messages.
      Due to no leader, they need to rely on some timestamp to sort the messages on each replica, so here comes the lamport counter or vector clock. If using lamport counter, the order from causality message pair is preserved, however we face the lost writes from the concurrent writes. Because the lamport timestamp cannot detect if two messages are concurrent, such order between concurrent writes is simply performed as last write win. Then this order of overridden may not be aligned with what application wants. If not accepted, we can use vector clock to preserve all values from concurrent writes, and let the application layer to resolve, say we have 5 concurrent writes on a single key-value, we can return application a list of 5 values for such key, and next application sending a write, it needs to pick one. Such way needs a version vector on each key-value/row/document which records the state of single object. Please check last section from DDIA chapter 5.
      Now comes to lecture 6.1, which is using a single leader design, all writes flow only to such leader. Then we rely on leader node to come up with the sequence/order and sync with all replica nodes. The way how leader treat concurrent writes can have similar approaches
      1. Using transactions with locks, so that the concurrent writes order can be executed in an undeterministic order. This can also encounter lost writes, because the last writes overrides may not from the request application really wants. (Example can be 5 writes to set value from x to i)
      2. Using transactions with actual serial execution, each transaction as a whole run 1-1 following the order of landing on leader node, still last write overrides.
      3. Same idea by using version vector, storing all values from the concurrent writes for the target value, let application decide.

  • @abdelaziz2788
    @abdelaziz2788 Před rokem

    17:15
    =ينفع ادخله؟
    -اه
    = ينفع اطلعه؟
    -اه
    =ينفع ادخله؟
    -اه
    =ينفع اطلعه؟
    -اه
    =ينفع ادخله
    -اه