Lindsey Kuper
Lindsey Kuper
  • 50
  • 200 030
Adventures in Building Reliable Distributed Systems with Liquid Haskell (FLOPS 2022 keynote talk)
Keynote talk presented at FLOPS 2022 on May 10, 2022.
Today’s most important computer systems are distributed systems: those that consist of multiple components that communicate by sending messages over a network, and where individual components or network connections may fail independently. Programming such systems is hard due to messages being reordered or delayed and the ever-present possibility of failure. Many liveness guarantees are impossible in such a setting, and even safety properties (such as “received messages arrive in the order they were sent”) can be challenging to prove. Protocols meant to ensure, say, a given message delivery order or a given data consistency policy are widely used in distributed systems, but verification of the correctness of those protocols is less common - much less machine-checked proofs about executable implementations.
Language-integrated verification techniques promise to bridge the gap between protocol specifications and executable implementations, letting programmers carry out verification directly in the same executable programming language used for implementation. One such language-level verification approach centers around refinement types: data types that let programmers specify logical predicates that restrict, or refine, the set of values that can inhabit the type, and that can be checked at compile time by an off-the-shelf SMT solver. Refinement types are beginning to make their way into general-purpose, industrial-strength programming languages through tools such as Liquid Haskell, which adds support for refinement types to the Haskell language. Liquid Haskell goes beyond last decade’s refinement types: its powerful reflection capabilities let you prove theorems in an extrinsic style, by defining Haskell functions (with help from the underlying solver). Furthermore, its integration with an existing programming language lets programmers work with pre-existing code and add richer specifications as they go. But is it up to the task of verifying interesting correctness properties of practically deployable distributed systems?
In this talk, I’ll report on my research group’s recent and ongoing efforts to answer that question in the affirmative. For example, in a messaging system consisting of several peer nodes, we can use refinement types to express properties about the order in which broadcast messages are delivered at a node, and we can use Liquid Haskell to prove those properties extrinsically. Likewise, in a replicated storage system we can express and prove properties about the convergence of replicated data structures. I’ll recount the pleasures and pitfalls of our journey so far, and discuss where we hope to go next.
0:00 Introduction
2:59 The speed of light is slow.
14:46 The distributed consistency model zoo: "In which ways may replicas disagree?"
18:40 Refinement types
22:19 Refinement reflection Vazou et al. 2013
28:22 Verified strong convergence of CRDTS [OOPSLA 2020]
38:03 Verified causal message delivery
43:05 What's next?
(Note: you can find my research blog at decomposition.al, not composition.al as shown on the last slide!)
zhlédnutí: 1 761

Video

CSE138 (Distributed Systems) L18: ask me anything
zhlédnutí 1,4KPřed 3 lety
UC Santa Cruz CSE138 (Distributed Systems) Lecture 18: ask me anything Recorded June 3, 2021 Professor Lindsey Kuper users.soe.ucsc.edu/~lkuper/ Course website: decomposition.al/CSE138-2021-03 Schedule of topics: decomposition.al/CSE138-2021-03/schedule.html
CSE138 (Distributed Systems) L17: MapReduce wrap-up; the math behind replica conflict resolution
zhlédnutí 1,4KPřed 3 lety
UC Santa Cruz CSE138 (Distributed Systems) Lecture 17: MapReduce wrap-up; the math behind replica conflict resolution: recap of strong convergence; recap of partial orders; upper bounds, least upper bounds, join-semilattices Recorded June 1, 2021 Professor Lindsey Kuper users.soe.ucsc.edu/~lkuper/ Course website: decomposition.al/CSE138-2021-03 Schedule of topics: decomposition.al/CSE138-2021-0...
CSE138 (Distributed Systems) guest lecture from Cyrus Hall: "Heterogeneous Distributed Systems"
zhlédnutí 1,5KPřed 3 lety
UC Santa Cruz CSE138 (Distributed Systems) guest lecture from Cyrus Hall: "Heterogeneous Distributed Systems" Recorded May 25, 2021 Course website: decomposition.al/CSE138-2021-03 Schedule of topics: decomposition.al/CSE138-2021-03/schedule.html
CSE138 (Distributed Systems) L16: MapReduce
zhlédnutí 2,2KPřed 3 lety
UC Santa Cruz CSE138 (Distributed Systems) Lecture 16: online systems vs. offline systems, raw data vs. derived data; intro to MapReduce; MapReduce examples: forward index to inverted index, word count Recorded May 27, 2021 Professor Lindsey Kuper users.soe.ucsc.edu/~lkuper/ Course website: decomposition.al/CSE138-2021-03 Schedule of topics: decomposition.al/CSE138-2021-03/schedule.html
CSE138 (Distributed Systems) L15: introduction to sharding; consistent hashing
zhlédnutí 3,3KPřed 3 lety
UC Santa Cruz CSE138 (Distributed Systems) Lecture 15: introduction to sharding; consistent hashing Recorded May 20, 2021 Professor Lindsey Kuper users.soe.ucsc.edu/~lkuper/ Course website: decomposition.al/CSE138-2021-03 Schedule of topics: decomposition.al/CSE138-2021-03/schedule.html
CSE138 (Distributed Systems) L14: Dynamo: Merkle trees, quorum consistency, tail latency
zhlédnutí 3KPřed 3 lety
UC Santa Cruz CSE138 (Distributed Systems) Lecture 14: Dynamo: review of old ideas (availability, network partitions, eventual consistency, application-specific conflict resolution); intro to: anti-entropy with Merkle trees, gossip, quorum consistency, tail latency Recorded May 18, 2021 Professor Lindsey Kuper users.soe.ucsc.edu/~lkuper/ Course website: decomposition.al/CSE138-2021-03 Schedule ...
CSE138 (Distributed Systems) L13: eventual consistency, availability, conflict resolution
zhlédnutí 2,4KPřed 3 lety
UC Santa Cruz CSE138 (Distributed Systems) Lecture 13: eventual consistency; strong convergence and strong eventual consistency; intro to application-specific conflict resolution; network partitions; availability; the consistency/availability trade-off; testing distributed systems Recorded May 13, 2021 Professor Lindsey Kuper users.soe.ucsc.edu/~lkuper/ Course website: decomposition.al/CSE138-2...
CSE138 (Distributed Systems) L12: Paxos and consensus wrap-up, passive/active replication
zhlédnutí 3KPřed 3 lety
UC Santa Cruz CSE138 (Distributed Systems) Lecture 12: Paxos wrap-up: nontermination, Multi-Paxos, fault tolerance; other consensus protocols: Viewstamped Replication, Zab, Raft; passive vs. active (state machine) replication; implementing read-your-writes and causal consistency Recorded May 11, 2021 Professor Lindsey Kuper users.soe.ucsc.edu/~lkuper/ Course website: decomposition.al/CSE138-202...
CSE138 (Distributed Systems) L11: more on consensus, the FLP result, the Paxos protocol
zhlédnutí 4,1KPřed 3 lety
UC Santa Cruz CSE138 (Distributed Systems) Lecture 11: more on consensus; the FLP result; the Paxos protocol Recorded May 6, 2021 My pedagogical approach to Paxos in this lecture is inspired by this excellent video by Luis Quesada Torres: czcams.com/video/d7nAGI_NZPk/video.html Professor Lindsey Kuper users.soe.ucsc.edu/~lkuper/ Course website: decomposition.al/CSE138-2021-03 Schedule of topics...
CSE138 (Distributed Systems) L10: determinism, consistency models, intro to consensus
zhlédnutí 3,3KPřed 3 lety
UC Santa Cruz CSE138 (Distributed Systems) Lecture 10: exam review; total order vs. determinism; consistency models (read-your-writes, FIFO, causal, strong), dealing with failure in strongly consistent replication protocols; intro to consensus Recorded May 4, 2021 Professor Lindsey Kuper users.soe.ucsc.edu/~lkuper/ Course website: decomposition.al/CSE138-2021-03 Schedule of topics: decompositio...
CSE138 (Distributed Systems) L9: primary-backup replication, chain replication, latency & throughput
zhlédnutí 3,9KPřed 3 lety
UC Santa Cruz CSE138 (Distributed Systems) Lecture 9: reasons to do replication; strong consistency (informally); primary-backup replication; chain replication; latency and throughput; midterm review Recorded April 27, 2021 Professor Lindsey Kuper users.soe.ucsc.edu/~lkuper/ Course website: decomposition.al/CSE138-2021-03 Schedule of topics: decomposition.al/CSE138-2021-03/schedule.html
CSE138 (Distributed Systems) L8: forms of fault tolerance, reliable delivery, reliable broadcast
zhlédnutí 3,4KPřed 3 lety
UC Santa Cruz CSE138 (Distributed Systems) Lecture 8: fault classification recap; forms of fault tolerance; implementing reliable delivery; idempotence; so-called "exactly-once" delivery; reliable broadcast; implementing reliable broadcast; intro to replication Recorded April 22, 2021 Professor Lindsey Kuper users.soe.ucsc.edu/~lkuper/ Course website: decomposition.al/CSE138-2021-03 Schedule of...
CSE138 (Distributed Systems) L7: safety and liveness, fault models, two generals problem
zhlédnutí 3,8KPřed 3 lety
UC Santa Cruz CSE138 (Distributed Systems) Lecture 7: Chandy-Lamport wrap-up; safety and liveness; reliable delivery; fault classification and fault models; two generals problem; common knowledge Recorded April 20, 2021 Professor Lindsey Kuper users.soe.ucsc.edu/~lkuper/ Course website: decomposition.al/CSE138-2021-03 Schedule of topics: decomposition.al/CSE138-2021-03/schedule.html
CSE138 (Distributed Systems) L6: Chandy-Lamport snapshot algorithm
zhlédnutí 10KPřed 3 lety
UC Santa Cruz CSE138 (Distributed Systems) Lecture 6: Chandy-Lamport snapshot algorithm; Chandy-Lamport assumptions and properties; centralized vs. decentralized algorithms Recorded April 15, 2021 Professor Lindsey Kuper users.soe.ucsc.edu/~lkuper/ Course website: decomposition.al/CSE138-2021-03 Schedule of topics: decomposition.al/CSE138-2021-03/schedule.html
CSE138 (Distributed Systems) L5: recap of FIFO/causal/TO delivery, implementing causal broadcast
zhlédnutí 7KPřed 3 lety
CSE138 (Distributed Systems) L5: recap of FIFO/causal/TO delivery, implementing causal broadcast
CSE138 (Distributed Systems) L4: vector clocks, FIFO/causal/totally-ordered delivery
zhlédnutí 13KPřed 3 lety
CSE138 (Distributed Systems) L4: vector clocks, FIFO/causal/totally-ordered delivery
CSE138 (Distributed Systems) L3: partial orders, total orders, Lamport clocks, vector clocks
zhlédnutí 9KPřed 3 lety
CSE138 (Distributed Systems) L3: partial orders, total orders, Lamport clocks, vector clocks
CSE138 (Distributed Systems) L2: time and clocks, causality and happens-before, network models
zhlédnutí 8KPřed 3 lety
CSE138 (Distributed Systems) L2: time and clocks, causality and happens-before, network models
CSE138 (Distributed Systems) L1: logistics/administrivia; distributed systems: what and why?
zhlédnutí 26KPřed 3 lety
CSE138 (Distributed Systems) L1: logistics/administrivia; distributed systems: what and why?
CSE138 (Distributed Systems) final exam review and AMA
zhlédnutí 1,6KPřed 4 lety
CSE138 (Distributed Systems) final exam review and AMA
CSE138 (Distributed Systems) guest speaker Karissa McKelvey
zhlédnutí 872Před 4 lety
CSE138 (Distributed Systems) guest speaker Karissa McKelvey
CSE138 (Distributed Systems) L23: some highlights from the distributed systems literature, 1975-2020
zhlédnutí 1,5KPřed 4 lety
CSE138 (Distributed Systems) L23: some highlights from the distributed systems literature, 1975-2020
CSE138 (Distributed Systems) L22: the math behind replica conflict resolution
zhlédnutí 1KPřed 4 lety
CSE138 (Distributed Systems) L22: the math behind replica conflict resolution
CSE138 (Distributed Systems) guest speaker Chris Colohan, "Blockchain Consensus"
zhlédnutí 1,4KPřed 4 lety
CSE138 (Distributed Systems) guest speaker Chris Colohan, "Blockchain Consensus"
CSE138 (Distributed Systems) L21: MapReduce: types, approach to fault tolerance, ...
zhlédnutí 1,1KPřed 4 lety
CSE138 (Distributed Systems) L21: MapReduce: types, approach to fault tolerance, ...
CSE138 (Distributed Systems) L20: online systems vs. offline systems, raw data vs. derived data; ...
zhlédnutí 1,1KPřed 4 lety
CSE138 (Distributed Systems) L20: online systems vs. offline systems, raw data vs. derived data; ...
CSE138 (Distributed Systems) recent grads panel: Emma Gomish, Lawrence Lawson, Pete Wilcox
zhlédnutí 1,2KPřed 4 lety
CSE138 (Distributed Systems) recent grads panel: Emma Gomish, Lawrence Lawson, Pete Wilcox
CSE138 (Distributed Systems) L19: more about quorum consistency; introduction to sharding; ...
zhlédnutí 1,6KPřed 4 lety
CSE138 (Distributed Systems) L19: more about quorum consistency; introduction to sharding; ...
CSE138 (Distributed Systems) L18: Dynamo
zhlédnutí 2,6KPřed 4 lety
CSE138 (Distributed Systems) L18: Dynamo

Komentáře

  • @bermick
    @bermick Před 2 dny

    Hi, I think in the graphs it can be seen that chain has higher throughput around 2 to 20%, not only 10-15%

  • @koncorsium87
    @koncorsium87 Před 12 dny

    sorry a bit of suggestion here, maybe if there is a problem explain in the first place, we can think of the some of question you ask, anyway thank you so much for place this great course open for us 🙏

  • @bermick
    @bermick Před 18 dny

    Great content! Hey regarding the internal events on crash faults: i think they do matter in case it later recovers and we would want to perform a chandy-lamport snapshot?

  • @bermick
    @bermick Před 25 dny

    really good material, I am a professional re-studying distributed systems stuff and this is of great value, thanks a lot!

  • @AnshSindhwani-pm6np
    @AnshSindhwani-pm6np Před 27 dny

    what a wonderful lecture series ! wish my professors where like this.

  • @thilaksalian5419
    @thilaksalian5419 Před 2 měsíci

    Thanks a lot. this helped clear lot of my doubts

  • @professional2094
    @professional2094 Před 2 měsíci

    this is so good. I came here after watching Martin Klepmann's videos. Your coverage on this topic is much more detailed. Very well explained, thanks a lot.

  • @johnshepard3197
    @johnshepard3197 Před 2 měsíci

    20:13

  • @felixsebastian1911
    @felixsebastian1911 Před 2 měsíci

    This was the only explanation that made sense to me, thank you 🙏

  • @ravikiran6646
    @ravikiran6646 Před 2 měsíci

    Thanks a lot for this wonderful lecture 😇

  • @hossamhazem5898
    @hossamhazem5898 Před 2 měsíci

    how do the processes know that the snapshot is done and they can starting taking a new snapshot/recording their state again? Love the course btw, watching out of curiosity, thanks!

  • @DebojyotiMajumder
    @DebojyotiMajumder Před 2 měsíci

    @lindseykuperwithasharpie Many thanks for your lectures. Can you please share some recomended reading on research papers as a follow up on this topic.

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 2 měsíci

      For vector clocks, Mattern's original paper "Virtual Time and Global States of Distributed Systems" (vs.inf.ethz.ch/publ/papers/VirtTimeGlobStates.pdf) is a good read. Even better, in my opinion, is Schwarz and Mattern's "Detecting Causal Relationships in Distributed Computations: In Search of the Holy Grail", from a few years later (vs.inf.ethz.ch/publ/papers/holygrail.pdf).

  • @globetrotter373
    @globetrotter373 Před 3 měsíci

    Hi Dr, How do you account for node failures or additions in vector clock implementations?

  • @spoomer94
    @spoomer94 Před 3 měsíci

    Chandy Lamport algo started 7:20

  • @aakashPotter
    @aakashPotter Před 3 měsíci

    You are a great teacher. I watched distributed systems lecture series by Martin Klepmann but found it too abstract. You usage of examples really makes the concepts easier to understand.

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 3 měsíci

      Thanks for watching! Martin Kleppmann's videos are great; I hope you'll give them another try after watching mine!

  • @zz-oq2dt
    @zz-oq2dt Před 3 měsíci

    keep going, very infomative and helpful presentation!

  • @meharjeetsingh5256
    @meharjeetsingh5256 Před 3 měsíci

    i wish you taught at my university SFU

  • @dibll
    @dibll Před 3 měsíci

    Thanks a lot for such valuable lessons.! I heard about state machine replication and not sure what it is and how it relates to replication. Do you mind explaining it a bit? Thanks in advance!!

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 3 měsíci

      I discuss state machine replication a few lectures later in the course: czcams.com/video/wHpB44jtS4g/video.html

  • @meetme2meat
    @meetme2meat Před 4 měsíci

    Why are we not talking about gossip protocol ..

  • @chrisho6836
    @chrisho6836 Před 4 měsíci

    For anyone wondering about the clarification at 8:08 - a student notices it too. The question RE: vector clocks at 16:31 was asked b/c the first vector clocks for R1 and R2 were copied incorrectly from the lecture notes i.e., for R1 [1,0,0,1] should have been [1,1,0,0] (R2 has a similar mistake). It is corrected it at the receipt of the second set of messages, but it may make it confusing to follow.

  • @saurabhman
    @saurabhman Před 4 měsíci

    +1 for talking about how to attempt to evenly distribute nodes across the ring by using the concept of v-nodes

  • @meetme2meat
    @meetme2meat Před 4 měsíci

    If Tcp guarantee to sender does that ensure the "common rational between the alice and bob in the 2 general problem."

  • @thachnnguyen
    @thachnnguyen Před 4 měsíci

    I don't get the story behind a virtual node being dead. Aren't all virtual nodes are just that, virtual, and the data are actually directed to the real nodes? Then won't that issue reduce to the real physical node being taken off? So long as the algo creating virtual nodes is consistent, replacing the physical M2 will reestablish the same virtual nodes for M2. Or just put in M6 at or near the old position of M2. If we update the preference lists to point to physical nodes, then it's the same as in the case of no virtual nodes.

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 4 měsíci

      The issue is that, if a physical node dies, and that physical node is mapped to many points on the ring represented by virtual nodes, then *many* points on the ring are affected. We no longer have the property that the immediate neighbor takes over the downed node's entire key range. Instead, if there are, say, 80 virtual nodes, then they're all gone at once and then 80 neighbors are affected. Of course, those 80 neighbors can be virtual, too, and mapping to various physical nodes. The Dynamo paper (in section 4.2) actually describes this as a feature of using virtual nodes, not a bug. If a physical node dies or is removed, the responsibility for its part of the key range is then spread out over the remaining physical nodes, instead of falling to just one neighbor. Likewise, if a physical node is added, instead of just taking a large burden off one neighbor, the relief will be spread out over the system. I think that's a great point, but it does makes it a little harder to reason about exactly which nodes will be affected by node addition/removal, whereas if virtual nodes aren't in the picture, it's clear that one neighbor will be affected and other nodes won't be.

  • @thachnnguyen
    @thachnnguyen Před 4 měsíci

    It's very unsettling that the concurrent write scenario (x=3 and x=4) is not satisfactorily addressed. What is the resolution of what x is? LWW based on what? If R1 goes by x=3, x=4 sequence, it records x=4, whereas the other node records x=3. How is that resolved? Merkle tree tells a node the diff, then what? Which node should make the change so that the system is consistent (though not necessarily correct). How does the banking system resolve this issue?

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 4 měsíci

      If the updates are concurrent, are a few options that Dynamo-like systems support. You could store both concurrent updates and let the client sort it out. Or you could break the tie with physical timestamps, meaning that someone will lose their write. If neither of these options is workable for a given application, then a Dynamo-like system may be the wrong choice for that application.

    • @chrisho6836
      @chrisho6836 Před 3 měsíci

      The dynamo paper mentions the utilization of vector clocks and maintaining all values that are causally unrelated - to be resolved manually. For the DynamoDB offering: "If applications update the same item in different Regions at about the same time, conflicts can arise. To help ensure eventual consistency, DynamoDB global tables use a last writer wins reconciliation between concurrent updates, in which DynamoDB makes a best effort to determine the last writer. This is performed at the item level. With this conflict resolution mechanism, all the replicas will agree on the latest update and converge toward a state in which they all have identical data." Note that DynamoDB, undoubtedly named after the original Dynamo service, is not the exact same service - but I thought it would be helpful to include how DynamoDB (what the public gets to use) resolves this.

  • @changxuren1200
    @changxuren1200 Před 4 měsíci

    Thank you so much for your lectures!! I really appreciate that you made these available on CZcams. I had a question at 1:03:37. I wonder why we do not include events C and D on Process 1? To my understanding, P1 keeps recording on channel C21 until the marker message from P2 is delivered, which happens after event D. By "Recording on a specific incoming channel", shouldn't it includes messages from that channel? However, I can also see that why we do not include event D in our snapshot. The algorithm says P1 should record its state when P1 initiates the process, which means including A and B in the snapshot, but not C. If we include D in the snapshot it will violates the consistency. In summary, by "recording on a specific channel", does it mean we are only waiting for the marker message that sends back from that channel?

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 4 měsíci

      You're right: when we record on a channel, we're recording messages received on that channel. However, the recorded *channel* state (which is a sequence of messages) is separate from the recorded *process* state (which is a sequence of events). I go into some more detail about this scenario in the last part of this blog post: decomposition.al/blog/2019/04/26/an-example-run-of-the-chandy-lamport-snapshot-algorithm/#discussion

  • @norliegh
    @norliegh Před 4 měsíci

    Hi Prof. Lindsey! I had a few questions for 44:18. Consider this distributed system! - Consisting 'n' process running on 'n' different machines where each process owns an individual vector clock. - An indiviual vector clock only counts message "send" from process A to process B as an event. - The system does multicast & unicast communication. (a process excludes itself when sending a message to others processes) 1. Can I say it is always true that the "next value" of a vector clock of a process at any given moment belongs to a finite set with a length of "n - 1"? (1 because the proces itself is excluded) 2. If the set of all possible "next values" of a vector clock is finite. Can I pre-compute them in advance before a messages is "received" or "delivered" by adding "1" to each index of the current value of a vector clock? (point-wise addition perhaps!) I have an example to paint my questions above. Imagine a system with 4 processes [P1,P2,P3,P4]. Process 3 (P3) starts with a vector clock of [0,0,0,0]. This vector clock & system only count "message send" as events. I think I can pre-compute all possible "next values" from the "current value" of the vector clock of Process 3 if I add "1" to each index except for Process 3. The set will be = { [1,0,0,0], [0,1,0,0], [0,0,0,1]} (question 2). Here the set length is finite following "n-1" (question 1) or 4-1 = 3. - Am I correct Prof. Lindsey? - Would this help answer the question when to keep a message in a reserve and when to deliver it? Say Process 2 receive a message from any other Process. It computes all possible values from its current state. If the received message's vector clock does not match one of the computed values. The message is put to reserve. If a match is found, the message is delivered. Thank You!

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 4 měsíci

      What message delivery property are you trying to enforce? It's a good idea to get clear on that before you start designing mechanisms to enforce the property. But let's suppose the property you want is causal message delivery. In that case, if you're in a setting where unicast messages are allowed, then it's *not* the case that a deliverable message is one that would only increment the recipient's clock by 1 in a single position (which sounds like it's what you're proposing, unless I've misunderstood). For instance, suppose we have Alice, Bob, and Carol. Alice sends a message m1 to Bob with VC [1,0,0]. Bob delivers m1, then sends m2 to Carol with VC [1,1,0]. Both m1 and m2 are unicast messages here, and m2 *should* be deliverable at Carol -- but Carol's VC is currently [0,0,0].

    • @norliegh
      @norliegh Před 4 měsíci

      @@lindseykuperwithasharpie Thank You Prof! I believe that an ideal system would enforce all three properties, - totally ordered delivery - casual message delivery - fifo message delivery. The example of "unicast messages" between Alice, Bob & Carol where deliverable messages will **not** increment by 1 in a single position totally makes sense. If I may Prof., Can I also say that "broadcast only" systems ("self" messages are excluded) also do not follow "n+1" just like "unicast included" systems? I'm unsure of above because I can consider a system with Alice, Bob & Carol. Alice (now 0,0,0) sends a message m1 to both Bob (now 0,0,0) & Carol (now 0,0,0). Then Bob (now 1,0,0) sends a message m2 to both Alice (now 1,0,0) and Carol (now 1,0,0). Let's say this system didn't behave as expected & Carol receives message m2 from Bob before message m1 from Alice. Message m2 from Bob holds the vector clock (1,1,0) & Message m1 from Alice holds (1,0,0). Here Carol can compute all possible next values from her own vector clock (0,0,0) which would be a set of {(1,0,0), (0,1,0)}. This way, she can know in advance which message it would accept next from her current vector clock regardless of if she has received a message or not. If a received message isn't within the set, She can put it into a buffer & wait until another message is received. Would this work Prof.?

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 4 měsíci

      As for "all three properties": causal message delivery gives you FIFO message delivery too, so if you have causal delivery, you don't have to separately do anything to enforce FIFO delivery. Totally-ordered delivery, though, is another can of worms, and can't be enforced using vector clocks. For causal delivery specifically, and for the broadcast case specifically, it sounds like you're getting closer to proposing the causal broadcast protocol I described in the lecture (which, if you're curious, is the CBCAST protocol described in this Birman et al. paper: dl.acm.org/doi/10.1145/128738.128742 ). Any message that has a VC that (a) is one bigger than the receiver's VC in the sender's position, and (b) is less than or equal to the receiver's VC in every other position, is deliverable. And yes, if you wanted to, you could pre-compute the set of all VCs that meet those two criteria and then check whether any arriving message has a VC that's in the set. (It could be expensive to search the set, though, especially as it gets large.)

    • @norliegh
      @norliegh Před 4 měsíci

      @@lindseykuperwithasharpie That is very cool & it clears my confusion. I had realised the inefficiency of the "pre-compute" approach later in the lecture professor when you answered your approach on "how to decide if one process should deliver a message or put in a reserve buffer for later delivery". I still find it quite interesting. I'll definately give that paper a read. Thank you so much Prof. Lindey! 💙

  • @thachnnguyen
    @thachnnguyen Před 5 měsíci

    Wrt to the token example, why isn't the token part of the state? i.e., suppose we say A or C are states, and either A or C needs the token to access a lock. System crashed, we restore. We get the picture as drawn. If the token is part of a state, then either A or C has it. Assume by your example, A has the token and C doesn't. Then even without m1, the code should probably tell you P1 should be passing that token to P2. Thus, we can reconstruct D. On the other hand, suppose the snapshot says A doesn't have the token. Still the same conclusion: P1 must have passed it to P2.

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 4 měsíci

      I'm not sure I understand your question, but there's no assumption here that the application code is something that you can actually access or inspect. All you have is the single execution that's being snapshot.

  • @thachnnguyen
    @thachnnguyen Před 5 měsíci

    2 aspects left hanging, very unsettling. At least something should have been mentioned. 1. What to do with that left over m1 in C21? Throw it away? Frame it? What? and 2. How are these (including the current snapshot) relevant to the next snapshot?

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 4 měsíci

      The "leftover" message is part of the global snapshot, because it's part of the recorded channel state for channel C_21, and the global snapshot consists of all the recorded process states and all the recorded channel states. What to do with any particular part of a snapshot is naturally going to depend on your reason for having taken the snapshot.

  • @thachnnguyen
    @thachnnguyen Před 5 měsíci

    Q1: still wasn't clear _what_ is in the marker message. Is it just some label? If so, all the marker messages sent by P2 and P3 have the exact same label in their own marker messages when they sent them out? I assume so otherwise it's not clear what it means by a process "has seen" the marker message.

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 4 měsíci

      By "has seen" here, I mean "has sent or received". A marker message has no interesting payload; it only says, "Hi, I'm a marker message."

  • @thachnnguyen
    @thachnnguyen Před 5 měsíci

    No one asked this question: is it a delayed queue or an array? A queue implies you pop out the head, whereas the array implies you go through the whole collection. Why? In the case where there are many processes, the one blocking the queue may not be the one you need to pop. What if a supposedly even later message blocks the queue? You can never get to the (earlier) ones behind the front!

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 4 měsíci

      Great question! Yes, at any given time the delay queue may have multiple delayed messages in it, and when a message gets delivered, it could unblock multiple queued messages. Importantly, though, messages in the delay queue are in vector clock order (or, more specifically, in a total order that's consistent with the vector clock order), with the smallest first. So, if a message getting delivered unblocks *any* other message(s), it'll unblock the first one in the queue (and possibly the one after that, and so on). The name "delay queue" comes from the paper "Lightweight causal and atomic group multicast" by Birman, Schiper, and Stevenson ( dl.acm.org/doi/10.1145/128738.128742 ).

  • @norliegh
    @norliegh Před 5 měsíci

    Thank you Prof!! I hadn't had the chance to goto a uni & this helps a ton. ❤

  • @Yash-zq6bg
    @Yash-zq6bg Před 5 měsíci

    At 17:46, when we stop recording on C21, shouldn't [C->D] be in the channel?

  • @erenyeager4452
    @erenyeager4452 Před 5 měsíci

    Hello, I am looking to follow this course on youtube. I know Python and DSA. I come from a non CS background. Do you think I can take this course and get thru it?

  • @ainaa_
    @ainaa_ Před 6 měsíci

    1:24:21. Why didn't we send Accept (6,foo) to A1 and sent to A2 n A3 though A2 had already accepted (5,foo)

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 5 měsíci

      We could have sent Accept(6, foo) to A1 also. Paxos only requires that we send it to a majority of acceptors. It's interesting that the algorithm still works even if we don't communicate with all the acceptors.

  • @thestroke82
    @thestroke82 Před 6 měsíci

    From Italy, studying for an open competitive employment exam, here just to leave a big thank you.

  • @stanleyjohn4670
    @stanleyjohn4670 Před 6 měsíci

    Hi Ms.Lindsey, at 1:08:05, you've mentioned 2d+r is the timeout formulae. Can you explain how it's derived? Thanks

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 5 měsíci

      The idea is that d is the time it takes a message to get from one node to another, and r is the time it takes for the message to be processed on the remote end. So, it's the time it takes for a request to go across the network to a remote node (d), be processed there (r), and a response to come back (d again). This is all a bit silly, though, since in an asynchronous network, there is no fixed d, since a sent message can take arbitrarily long to be received.

  • @freemanwright3553
    @freemanwright3553 Před 6 měsíci

    Dear Professor: I think there may be another reason for that curve is because that in chain replication, all the writes will eventually go to the tail as well, therefore, when the write percentage are slow, all the writes request are "buffered" in the previous processes until they reach the tail process. It's kind of like the "peak-cut" effect of the message queue. However,when the write percentage getting more and more, the "peak-cut" effect is diminished, the tail process will actually have to deal with write and read request at the same time, which is equivalent to the primary-backup replication.

  • @kamleshbhatt1
    @kamleshbhatt1 Před 7 měsíci

    Great Lecture series Professor !! Could you please also share the (programming) assignments, I would like to try out. Please consider of launching your Distributed System course on MOOC platforms like Coursera so more and more learners get benefited.

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 7 měsíci

      I don't make the assignments for this course public, but as an alternative, I recommend checking out the excellent DSLabs: github.com/emichael/dslabs

  • @kim99343
    @kim99343 Před 7 měsíci

    Thank you for this lesson , really appreciate it.

  • @austinoquinn815
    @austinoquinn815 Před 8 měsíci

    Isn't FIFO weaker that RYW consistency. For example, your could have a process obey fifo write 50 to server A and then in fifo order get balance form server B. This could obey fifo but still return an incorrect value. RYW also seems to guarantee fifo. Thanks in advance!

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 8 měsíci

      I think the confusion here comes from the fact that RYW is what's known as a "client-centric" consistency model, whereas FIFO (and all the other consistency models I discussed in this lecture) are "data-centric". The data-centric consistency models put restrictions on how data is updated on individual replicas. They have to do with how those replicas are in sync (or not) with each other. The client-centric consistency models, on the other hand, take into account the history of interactions between a client and the replicas. Client-centric consistency models include RYW and the other "session guarantees" identified by Terry et al. ( ieeexplore.ieee.org/document/331722 ). I think it was a mistake of me to try to put a client-centric consistency model into a hierarchy with data-centric ones, since that's confusing. That said, Brzezinski et al. do have a paper ( link.springer.com/chapter/10.1007/978-3-540-24669-5_1 ) where they establish that FIFO is stronger than RYW. In particular, Theorem 1 of that paper says: "At the server side, PRAM consistency is fulfilled if and only if RYW and MW conditions are preserved." (Here, PRAM is a synonym for FIFO, and MW stands for "monotonic writes", which is another one of Terry et al.'s session guarantees.) In other words, we have FIFO if we have RYW *and* something else (MW), and we have RYW *and* something else (MW) if we have FIFO. Hence, FIFO is stronger than RYW. But the Brzezinski et al. paper is quite technical, and I have no intuitive explanation of why this is the case, sorry. (I came across this result secondhand, in Viotti and Vukolić's big scary survey of consistency models ( arxiv.org/abs/1512.00168 ) -- notice that they too put RYW below PRAM (FIFO) in their hierarchy.)

  • @user-sk3yl6ib3m
    @user-sk3yl6ib3m Před 8 měsíci

    legends are watching this video day before exams

  • @ChrisRodier76
    @ChrisRodier76 Před 9 měsíci

    Around minute 40, possibly the most exciting CS lecture, or most dramatic minute I can recall! Poor Alice! 😂. Amazing lectures, thank you!!!

  • @kevinpatel6157
    @kevinpatel6157 Před 9 měsíci

    Your lecture series is a really good distributed systems starter. I went through many different contents available on the internet just to give them up eventually. When I started this series, I did not hope I'll come this far. Many thanks for putting them here.

  • @patrickaimekankeu2916
    @patrickaimekankeu2916 Před 9 měsíci

    thank you for explanation

  • @bharathram3977
    @bharathram3977 Před 9 měsíci

    Are the slides available for this lecture? Video stream is cutting into the text

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 9 měsíci

      I'm not sure if the slides for this specific talk are available, but Chris's slides for his own distributed systems course are available at www.distributedsystemscourse.com, and some of the lectures are very similar in content to this talk.

  • @SaimonThapa
    @SaimonThapa Před 9 měsíci

    lol a wild "fuck u" appears in a distributed systems lecture! never expected that 😆. Great lecture btw! Thank you.

  • @bharathram3977
    @bharathram3977 Před 9 měsíci

    @41:50 (Not to nitpick the prof, just adding it for fellow viewers). The actual mapreduce papers mentions that the intermediate key-value pairs are buffered in memory but they are periodically written into files stored on local disk on the map workers.

    • @bharathram3977
      @bharathram3977 Před 9 měsíci

      Oh, nvm, in later point, it's mentioned in lecture that map workers data is stored in the files

  • @bharathram3977
    @bharathram3977 Před 9 měsíci

    @1:18:11 When m2 fails, the range if keys it was responsible for, would be replicated to next N nodes down the circle. Assuming N=3, m3 & m4 already would be having that data, so it would be m5 which would have to store a copy of m2's range of keys, right? So, m5 is the one that gets "affected" (i.e increase in storage, slightly higher resource utilisation during the time replication happens etc..)

  • @bharathram3977
    @bharathram3977 Před 9 měsíci

    @1:15:55 You mentioned that if N=W and if there failures (of nodes, networks etc.), then writes can enever happen. Just wanted to point out that this might not be the case always, due to the idea of "sloppy quorum" and "preference list". The number of nodes in preference list is > N and from this list, "top N healthy" nodes are choosen for read/write operations. Then, they use the idea of Hinted-Handoff to replicate data back once nodes have recovered

  • @tarunpahuja3443
    @tarunpahuja3443 Před 10 měsíci

    Hi, What book would you suggest to go through along with these lectures? Could you please tell where could i find proof of casual broadcast algorithm. Great Lectures, Thanks

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie Před 9 měsíci

      The causal broadcast algorithm I show here is from the classic paper "Lightweight Causal and Atomic Group Multicast" by Birman, Schiper, and Stephenson ( dl.acm.org/doi/10.1145/128738.128742 ). That paper gives informal safety and liveness proofs for the algorithm. Recently, my students and I mechanically verified the safety property ( dl.acm.org/doi/10.1145/3587216.3587222 ) using Liquid Haskell. Here's a blog post about a talk I gave describing that work: decomposition.al/blog/2022/09/07/verified-causal-broadcast-with-liquid-haskell/

    • @tarunpahuja3443
      @tarunpahuja3443 Před 9 měsíci

      @@lindseykuperwithasharpie thanks a lot 🙏 Amazing lectures.