Distributed Systems 8.1: Collaboration software

Sdílet
Vložit
  • čas přidán 27. 07. 2024
  • Accompanying lecture notes: www.cl.cam.ac.uk/teaching/212...
    Full lecture series: • Distributed Systems le...
    This video is part of an 8-lecture series on distributed systems, given as part of the undergraduate computer science course at the University of Cambridge. It is preceded by an 8-lecture course on concurrent systems for which videos are not publicly available, but slides can be found on the course web page: www.cl.cam.ac.uk/teaching/212...

Komentáře • 31

  • @zhou7yuan
    @zhou7yuan Před 2 lety +19

    Collaboration and conflict resolution [0:20]
    Families of algorithms [2:07]
    Conflict-free Replicated Data Types (CRDTs)
    Operational Transformation (OT)
    Conflicts due to concurrent updates [2:31]
    Operation-based map CRDT (algorithm) [3:35]
    Operation-based map CRDTs [8:08]
    Recall strong eventual consistency [8:32]
    CRDT algorithm implements this [9:00]
    State-based map CRDT (algorithm) [10:13]
    State-based CRDTs [13:54]
    Merge operator satisfy: communtative, associate, idenpotent
    State vs operation [14:30]
    op: smaller messages
    st: tolerate message loss/duplication
    Not necessarily uses broadcast [16:13]
    (google docs demo) [17:17]
    (disconnected) [17:48]
    Collaborative text editing: the problem [19:08]
    Operational transformation [22:00]
    Text editing CRDT [25:33]
    (character@float position)
    Operation-based text CRDT (1/2) [29:01]
    Operation-based text CRDT (2/2) [32:47]
    causal broadcast on insertion / deletion of a character

  • @davidespinosa1910
    @davidespinosa1910 Před 11 měsíci +2

    Great explanation of OT: it transforms operations past (already executed) operations that are concurrent with them. That's the whole algorithm ! However, the whole point of OT is to use causal order instead of total order. If we require total order, OT is unnecessary.

  • @programming6881
    @programming6881 Před rokem

    Awesome @Marin Klepmann, this is really good. Your explanations and the choice of examples are excellent. Thank you again. I am reading about distributed systems in the last few weeks, the lecture series increased my love for distributed systems.

  • @good-vibes-inc
    @good-vibes-inc Před 3 lety +7

    Maaan, I almost made it. Thanks for the lectures!

  • @reubenkuhnert6870
    @reubenkuhnert6870 Před 2 lety

    Excellent explanation, let me watch the whole series

  • @exoticcoder5365
    @exoticcoder5365 Před 3 lety

    Text editing CRDT is very useful, thank you 🙏🏻

  • @chon-850
    @chon-850 Před 3 lety

    love your book so much!

  • @1729abhishek
    @1729abhishek Před 5 měsíci

    Say users 1 and 2 insert 'A' and 'B' at position 0. After broadcast and delivery, chars will converge to {(0, null, ⊢), (0.5, 1, 'A'), (0.5, 2, 'B'), (1, null, ⊣)} and 'A' will be visible on both devices. However, if either user attempts to delete 'A' they will see 'B' on their device because the request to delete the character at position 0 will only remove (0.5, 1, 'A') from chars. I found the lecture to be extremely helpful though.

  • @yuanshaoqian
    @yuanshaoqian Před 7 měsíci

    Nice lecture, awesome. Thanks for the work

  • @mmfStudent
    @mmfStudent Před 2 lety +1

    Hey, do you have an example of practical usage of this algorithm with fractional indexes?

  • @fatihsenturk649
    @fatihsenturk649 Před 3 lety +2

    well, what would happen if both two nodes insert a value at 0.875? for example, userA wanted to put F in between C and end-symbol, also concurrently userB wanted to put E at the same place. 28:57

    • @AlexeyPirogov
      @AlexeyPirogov Před 3 lety +3

      My understanding - this is a conflict and it can't be resolved 100% correct without human interaction.
      But this algorithm just uses nodeId in this case - sorts all characters by insert position and nodeId.

  • @kirubhakaranmohan9811
    @kirubhakaranmohan9811 Před rokem +1

    Hi,
    How complicated it would be with CRDT for Google Sheets ?
    Could you please post a video on that ?

  • @lidu007
    @lidu007 Před 3 lety +1

    The statement at 25:06 was inaccurate. OT does not require total order broadcast. Reliable or even best-effort broadcast is enough. OT algorithms have an embedded lightweight mechanism to ensure causal delivery, e.g., by delivering an operation only after all causally preceding operations have been delivered according to their vector timestamps. Other than that, concurrent operations can be delivered in arbitrary order and are transformed against operations in the log to ensure convergence. That is, OT does not depend on a total order of delivery. OT achieves eventual consistency (convergence) after all operations are applied on all replicas, as long as causal relation is preserved.

    • @2tce
      @2tce Před 2 lety

      Total order broadcast is still eventually consistent right?

    • @1729abhishek
      @1729abhishek Před 5 měsíci

      In the example itself, the delivery was not total order since the two inserts were delivered in different orders on the two nodes. However, the speaker qualified his statement with "generally". So, although weaker broadcasts such as causal may work with some OT data and operation models, there are also cases where total broadcast is needed. I am not sure which cases he is referring to but this is what I gathered from his statement.

  • @mykytapiskarov7291
    @mykytapiskarov7291 Před 7 měsíci

    Do I understand correctly that Operation-based map CRDTs works only if update entirely overwrites previous value?
    Because if value not entirely overwritten, e.g. value is string, and update operations is concatenation, then depend on order of updates - we'll get different result.

  • @yangwang1577
    @yangwang1577 Před 2 lety

    For the calendar use-case, don't we usually have a centralized storage for the calendar data? I always thought the sync would be between individual clients and the service and not between two clients (devices)...

    • @AndreiStreltsov
      @AndreiStreltsov Před 2 lety

      I think this does not change the algorithm. In addition to being a regular node, the server node is also tasked with broadcasting the events to each client. So essentially the server implements a mechanism for reliable broadcast.

    • @BharCode09
      @BharCode09 Před rokem

      For the calendar use case, when the devices are in offline state, the changes are first saved locally (offline, in device storage). Now when the device comes online, the sync has to happen between devices with their locally applied changes and with centralized storage as well. When mutliple devices try to sync with Centralized storage, then there may be conflicts. That's what its explained here.

  • @kd6613
    @kd6613 Před rokem

    Very few software that actually use Lamport or vector clocks?

  • @sahibgrover3998
    @sahibgrover3998 Před rokem

    Why does operation based crdt only need commutativity butstate based also need associativity

  • @Dake1989
    @Dake1989 Před 2 lety

    I have a question about operation-based text CRDT (the one with the floating number positions). If the node id is used to break ties when merging, does that mean if userA inserts a sentence between some two initial characters and userB inserts a diffeent sentence between the same two initial characters, when the network heals, the merge process will interleave their edits character by character?
    XHello!X + XBye!X == XXHBeylel!o!X ??

    • @mikaylamaki4689
      @mikaylamaki4689 Před 2 lety

      From what I understand, that’s exactly right! Martin’s other talk on CRDTs covers this and many other hairy edge cases: czcams.com/video/x7drE24geUw/video.html

    • @1729abhishek
      @1729abhishek Před 5 měsíci

      Let's pick a smaller example. Say that a user types 'AB' on node 1 and another user types 'CD' on node 2, and now we must figure out how to merge the two strings. On node 1, insert(0,A) leads to chars = {(0, null, ⊢), (0.5, 1, A), (1, null, ⊣)}. Then, insert(1,B) leads to chars = {(0, null, ⊢), (0.5, 1, A), (0.75, 1, B), (1, null, ⊣)}. On node 2, the two insertions yield chars = {(0, null, ⊢), (0.5, 2, C), (0.75, 2, D), (1, null, ⊣)}. After broadcast and delivery, the chars on both nodes will converge to chars = {(0, null, ⊢), (0.5, 1, A), (0.5, 2, C), (0.75, 1, B), (0.75, 2, D), (1, null, ⊣)}. Since ElementAt gives precedence to node 1, the string `AB` will show for both users.

  • @user-oq8ph9ee7k
    @user-oq8ph9ee7k Před rokem

    What is the nodeId ? How to generate it ?

  • @AndreiStreltsov
    @AndreiStreltsov Před 2 lety

    I think the operation-based CRDT algorithm described in the video would not actually be suitable for a real calendar application.
    It uses Lamport clocks to impose ordering on the edit events, which means that any events that are concurrent with respect to the system itself are going to be essentially arbitrarily ordered, which may be inconsistent with causality from the user's perspective.
    For example say a user modifies the same key on both nodes in between syncs. From the system's perspective the modifications are concurrent, but for the user they are not, as the user known which edit happened before the other.
    I can think of only two options to overcome this. Either we have to use physical clocks, or hybrid vector+physical clock. When using a physical clock only, clock skew will lead to the loss of changes that originated on the lagging node even though logically many of the events could have been ordered correctly. By using a hybrid approach I think we could reduce the impact of clock skew by preferring logical timestamps when the events are not concurrent and falling back to physical timestamps only to order the truly concurrent events. Only vector clocks can be used for this, since Lamport timestamps don't differentiate between concurrent and non-concurrent events.

    • @2tce
      @2tce Před 2 lety

      I agree. This is similar to another comment I made elsewhere. See Google's Spanner. They use atomic clocks (physical)

    • @Ynno2
      @Ynno2 Před 3 měsíci

      I was very confused by this. If you had two devices in airplane mode which were synced to the same point and then made conflicting changes on the same field and used only Lamport clocks because we don't trust the wall clock time, I can't see how the results could be correct.
      I understand that because you could have some node ID that the Lamport clock timestamps would be unique and one could be guaranteed to clobber the other in some deterministic way and so in that sense is commutative and maybe "theoretically" correct. But for a real world perspective this would be a very bad way to resolve the conflict. It's LWW for some very perverse definition of "last". Or did I totally misunderstand this concept?