CSE138 (Distributed Systems) L6: Chandy-Lamport snapshot algorithm

Sdílet
Vložit
  • čas přidán 7. 09. 2024
  • UC Santa Cruz CSE138 (Distributed Systems) Lecture 6: Chandy-Lamport snapshot algorithm; Chandy-Lamport assumptions and properties; centralized vs. decentralized algorithms
    Recorded April 15, 2021
    Professor Lindsey Kuper users.soe.ucsc...
    Course website: decomposition.a...
    Schedule of topics: decomposition.a...

Komentáře • 57

  • @thsmanasou2
    @thsmanasou2 Před 2 lety +2

    Thank you very much for your video lectures. Your lectures helped me obtain a deeper understanding of several topics with the most helpful being the one about Paxos. Plus, I aced my Distributed finals exam!

  • @jiachengxu6336
    @jiachengxu6336 Před 3 lety +4

    This is such a wonderful course !!

  • @hossamhazem5898
    @hossamhazem5898 Před 4 měsíci

    how do the processes know that the snapshot is done and they can starting taking a new snapshot/recording their state again? Love the course btw, watching out of curiosity, thanks!

  • @changxuren1200
    @changxuren1200 Před 6 měsíci

    Thank you so much for your lectures!! I really appreciate that you made these available on CZcams.
    I had a question at 1:03:37. I wonder why we do not include events C and D on Process 1?
    To my understanding, P1 keeps recording on channel C21 until the marker message from P2 is delivered, which happens after event D. By "Recording on a specific incoming channel", shouldn't it includes messages from that channel?
    However, I can also see that why we do not include event D in our snapshot. The algorithm says P1 should record its state when P1 initiates the process, which means including A and B in the snapshot, but not C. If we include D in the snapshot it will violates the consistency.
    In summary, by "recording on a specific channel", does it mean we are only waiting for the marker message that sends back from that channel?

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před 6 měsíci +1

      You're right: when we record on a channel, we're recording messages received on that channel. However, the recorded *channel* state (which is a sequence of messages) is separate from the recorded *process* state (which is a sequence of events). I go into some more detail about this scenario in the last part of this blog post: decomposition.al/blog/2019/04/26/an-example-run-of-the-chandy-lamport-snapshot-algorithm/#discussion

  • @valentinussofa4135
    @valentinussofa4135 Před rokem

    Great course. Many thanks from Indonesia🙏

  • @Yash-zq6bg
    @Yash-zq6bg Před 6 měsíci

    At 17:46, when we stop recording on C21, shouldn't [C->D] be in the channel?

  • @manokumar89
    @manokumar89 Před 2 lety +2

    @Lindsey Kuper Professor, in chandy-lampert snap shot algo, the processes start recording the incoming messages, but it was shown to be used. The reason behind why we have to record the incoming channels for messages and what we have to do with it remains unclear. It will be of great help if you can clarify it. maybe just to see the marker and close the channels to know that we are done with this process so that we can terminate the snapshot algo? thank you

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před 2 lety +1

      Check out my blog post -- there's a discussion at the end about why we record channel states: decomposition.al/blog/2019/04/26/an-example-run-of-the-chandy-lamport-snapshot-algorithm/

    • @angelmotta
      @angelmotta Před rokem

      @@lindseykuperwithasharpie is there a new link for the blog post? this link doesn't work on my end. Thank you for these classes!!

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před rokem

      Just fixed the broken link -- thanks for the heads up.

  • @SherubThakur
    @SherubThakur Před rokem

    Hey Lindsey. Firstly thanks for making this course available online these are a great help.
    I wanted to understand if the time it takes to snapshot the state on one process has any impact on the functioning of this algorithm. We discussed as if the snapshoting on individual process is instantaneous from what I can tell.
    From what I understand if the messages are sent after the snapshot is taken then the time taken for creating a shapshot on one process should not matter and the algorithm will work as intended.
    Just want to confirm that this is right.

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před rokem

      In this model, we're thinking of processes as single-threaded: events happen on a process one at a time and atomically. "Snapshot your own state" is one of those atomic events. Other events (like message receives or sends) can't happen *during* a "snapshot your own state" event -- otherwise it wouldn't be obvious whether those events should be included in the snapshot or not. So we do want it to be fast. But since it's a computation that takes place entirely locally without the need for any communication, we can think of it as happening relatively fast.

    • @SherubThakur
      @SherubThakur Před rokem

      @@lindseykuperwithasharpie Thanks for the explanation.

    • @tarunpahuja3443
      @tarunpahuja3443 Před 11 měsíci

      @@lindseykuperwithasharpie Its bit confusing. At one hand we are saying algorithm should run concurrently with the application flow and here we are assuming algorithm will pause flow of the process for the time being.

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před 11 měsíci

      ​@@tarunpahuja3443Think of it this way: we don't have to wait for the snapshot algorithm to run to completion before the process is allowed to do other things. In other words, while a process is sitting around waiting for marker messages, it can be sending and receiving application messages. The time that it takes to carry out any local action (like snapshot one's local state) doesn't really count for the purposes of this kind of reasoning -- it should be considered negligible compared to the time it takes to do anything involving remote communication.

    • @tarunpahuja3443
      @tarunpahuja3443 Před 11 měsíci

      @@lindseykuperwithasharpie Thanks for the explanation

  • @mouayadmardini8285
    @mouayadmardini8285 Před 2 lety +1

    This is very helpful, Thank you very much!.

  • @sumeetkumar5080
    @sumeetkumar5080 Před rokem +1

    For the example described in the video at time 18:27, shouldn't event 'D' be also part of the snapshot. i.e. while Process P1 is recording on its incoming channel (C21), shouldn't the event corresponding to the messages delivered during this period be also part of the snapshot

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před rokem +1

      A snapshot consists of both process state recordings and channel state recordings. Event D wouldn't be part of P1's recorded *process* state. However, the message whose receive event is D would indeed be part of C_21's recorded channel state. Channel state recordings are sequences of messages, not events.

    • @milkylearns
      @milkylearns Před 7 měsíci

      ​@@lindseykuperwithasharpie So a process state only records "send" events? Because the channel state would be recording "receive events"? Also, how would this algorithm combine with the difference between receiving and delivering messages? Would the process state record deliver events and channel state record receive events?
      Sorry for the multi question reply haha
      Thanks for the video, appreciate you crediting and explaining how you found the example from other courses!

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před 6 měsíci

      @@milkylearns No, process state recordings include all kinds of events, and channel state recordings include messages. Messages and events are different things.

  • @thachnnguyen
    @thachnnguyen Před 6 měsíci

    Q1: still wasn't clear _what_ is in the marker message. Is it just some label? If so, all the marker messages sent by P2 and P3 have the exact same label in their own marker messages when they sent them out? I assume so otherwise it's not clear what it means by a process "has seen" the marker message.

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před 6 měsíci

      By "has seen" here, I mean "has sent or received". A marker message has no interesting payload; it only says, "Hi, I'm a marker message."

  • @sayan8138
    @sayan8138 Před rokem

    Can two nodes initiate the Chandi Lamport algorithm in a distributed system? If yes then how will it affect the outcome , if not then why?

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před rokem +3

      Did you watch the video?

    • @sayan8138
      @sayan8138 Před rokem

      Yeah, I got it actually I didn't saw till the end I only watched till the algorithm explaination part.

  • @giridhar510
    @giridhar510 Před 3 lety

    @Lindsey Kuper Professor, i went through original paper "Distributed Snapshots: Determining Global States of Distributed Systems".
    Not even once happens-before was mentioned in the paper. (even though this paper is written by Leslie Lamport)
    You used happens-before to define what a consistent snapshot is.
    Wondering if your presentation digresses a bit from what the paper calls "meaningful" snapshot ?
    Property of recorded snapshot as per paper is that in an equivalent computation seq', recorded snapshot captures global state after all prerecordings events and before all postrecording events.
    Happens-before is much easier to follow.
    Wondering if consistent snapshot and meaningful snapshot are equivalent or not.

    • @giridhar510
      @giridhar510 Před 3 lety

      read "An introduction to snapshot algorithms in distributed computing" paper by Ajay D Kshemkalyani et.al
      So, the recorded snapshot by the algorithm is both consistent and meaningful.
      It is amazing that recorded global snapshot corresponds to a point-in-time global snapshot that could have occurred if sequence of events is seq'.

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před 3 lety +2

      Great questions. Often the *first* paper about a concept (in this case, the original Chandy-Lamport paper from 1985) is not the paper with the best explanation of the concept. The authors may not have fully understood the relationship between happens-before and meaningful snapshots at the time they wrote the paper. The Kshemkalyani et al. survey paper you found is a really good one! I also recommend Mattern's "Virtual Time and Global States of Distributed Systems" from 1989, which Kshemkalyani et al. cite.

  • @zhjwpku
    @zhjwpku Před 2 lety

    I’m wandering if there are more than one initiators, will there be a single snapshot or each initiator got a snapshot? Is that related to the marker id? Can two initiators send the same marker?

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před 2 lety

      It's still just one snapshot. If two initiators both send marker messages to a third process, one of the marker messages will get there first, and that's when the process state will be recorded. Two distinct initiators cannot send the same marker to a process; for one thing, the markers will be arriving on distinct channels.

  • @nischayranjan3728
    @nischayranjan3728 Před rokem

    Can you explain how snapshots are used for deadlock detection? I am not able to understand that point @ 1:33:52

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před rokem

      A property like "the system is deadlocked" is what's known as a stable property, meaning it's a property that, once it becomes true of a system, remains true (unless we do something like kill a process and reset things). If a property is stable, that means we can snapshot the system and check the snapshot to see if the property is true of it; in this case, we can check the snapshot for deadlocks. If we detect a deadlock in the snapshot, then the deadlock is still present in the running system. For more on this topic, check out chapter 19 of Nancy Lynch's book "Distributed Algorithms".

  • @klindatv
    @klindatv Před 2 lety

    Hello what happens when a process recive the third (or more) marks ? As I understand when the process recieve the first mark, start recording, when it recieve the second mark it stops recording! thanks!

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před 2 lety

      Recordings are channel-specific. When a process receives its first marker message, it starts recording on all its incoming channels (except the one on which that first marker message arrived). After that, when it receives a marker message on an incoming channel, it stops recording on that particular channel. It should eventually get one marker message on each of its incoming channels.

    • @klindatv
      @klindatv Před 2 lety +1

      @@lindseykuperwithasharpie Oh now I understand, thanks!

  • @mikeyang2696
    @mikeyang2696 Před 2 lety

    Hi, I have a question that is there any way that the algorithm does not need to record the channel state?

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před 2 lety

      Not sure I understand the question. Are you asking "Under what circumstances does the Chandy-Lamport algorithm not need to record channel states?" Or are you asking for an alternative snapshot algorithm?

  • @darthvadar2915
    @darthvadar2915 Před 2 lety

    I would love to know if this is a graduate level course or an undergraduate level course. Thank you for the great lecture

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před 2 lety +1

      It's an undergrad course (as explained on the course website, which is linked from the video description).

  • @thachnnguyen
    @thachnnguyen Před 6 měsíci

    2 aspects left hanging, very unsettling. At least something should have been mentioned. 1. What to do with that left over m1 in C21? Throw it away? Frame it? What? and 2. How are these (including the current snapshot) relevant to the next snapshot?

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před 5 měsíci

      The "leftover" message is part of the global snapshot, because it's part of the recorded channel state for channel C_21, and the global snapshot consists of all the recorded process states and all the recorded channel states. What to do with any particular part of a snapshot is naturally going to depend on your reason for having taken the snapshot.

  • @sumeetkumar5080
    @sumeetkumar5080 Před 2 lety +1

    Hi, What is meant by marking a channel as empty ?

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před 2 lety

      A process is responsible for recording the states of its incoming channels. When you mark a channel as empty, you're saying that the recorded sequence of messages on that channel is empty. Messages might still be received on that channel, but they won't be part of the snapshot.

    • @sumeetkumar5080
      @sumeetkumar5080 Před 2 lety

      @@lindseykuperwithasharpie Thanks. So just to reiterate on what you said - The process continues to record messages on the incoming channel till it is marked as empty i.e "marking an incoming channel as empty" is equivalent to "stop recording on incoming channel" ?

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před 2 lety +1

      No. When you stop recording on a channel, the recorded channel state is the sequence of messages that you received on it during the time you were recording (which may or may not be empty).
      A process that sees a marker message for the first time marks the state of the channel the marker came in on as empty because it isn't going to bother recording on that channel at all. It's not stopping recording on that channel, because it never started.

    • @yoshitoki23
      @yoshitoki23 Před 2 lety

      @@lindseykuperwithasharpie I'm finding this confusing. If I'm understanding you right, you're saying "marking the incoming channel as empty" is equivalent to "don't record messages on this channel for the duration of the snapshot". Please correct me if I'm wrong. What's confusing is that the language "marking the channel as empty" suggests a totally different operation - at least in my mind, it sounds like it means "purge the channel's queue".
      Also, could you please help explain why, when a machine receives its first marker message, it doesn't record on the channel of the marker's sender, but it records on all other incoming channels? Why not record on all incoming channels? Thank you!!

    • @lindseykuperwithasharpie
      @lindseykuperwithasharpie  Před 2 lety +3

      Good point about the terminology being confusing! When we say that the snapshot algorithm marks a channel as empty, we only mean *in the recording that the snapshot is making*, which is the only place that the snapshot algorithm has the power to make state changes! Indeed, snapshot algorithms are careful not to interfere with the state of the processes that they are snapshotting. It would be bad if they did.
      When a process receives its first marker message, it doesn't record on the channel of the marker's sender because the sender already recorded its own state (otherwise it wouldn't be sending a marker message). So, any subsequent messages from the sender shouldn't be recorded in the snapshot.

  • @harshitshah3435
    @harshitshah3435 Před 3 lety

    Hey, when's the 7th lec coming on CZcams?