An introduction to Conflict-Free Replicated Data Types (CRDTs)

Sdílet
Vložit
  • čas přidán 6. 09. 2024
  • The concept of collaborative applications is well known. Popular examples of collaborative applications are Google docs and Figma, which are applications that allow people to collaborate, concurrently or asynchronously, on a common task, such as writing a document or creating a visual design.
    In the above examples, the actors that collaborate are people, but people aren’t the only possible type of actor. In a collaborative application, it is now possible, with Conflict-Free Replicated Data Types (CRDTs), to write collaborative applications where people and machines can be the actors.
    Conflict-Free Replicated Data Types are a collection of data types that can be used to write decentralized, distributed applications that are inherently collaborative.
    Want to follow our journey at Mycelial?
    mycelial.com/#...
    / mycelial

Komentáře • 11

  • @HyperFocusMarshmallow
    @HyperFocusMarshmallow Před rokem +2

    Seems like one of the main ideas roughly is to actually model numbers, specifically cardinality numbers - the counts of things using sets. Like the invariant in set theory. A cardinal number is a different concept than an ordinal number. Though the two concepts are closely related.
    The other main idea is unique identifiers of counter-events, that probably makes use of some hash-function.
    The actual event list would probably be modeled similarly to a HashSet.
    Since just like when you’re counting apples in your apple bowl, you are allowed to count apples in any order you want, the individual events must be able to be counted in any order. That’s a property that makes sense for cardinal numbers but not really for ordinals. 2 always comes before 3 in ordinals. But for cardinality they’re both just a one and the aggregating operation commutes.

  • @winspyre
    @winspyre Před rokem +2

    excellent explanation. great content.

  • @dnmurphy48
    @dnmurphy48 Před 2 lety

    quite fascinating, but I will have to watch this again and look at what they thnk they are delivering to fully understand it and its possible value.

  • @arpanghoshal2579
    @arpanghoshal2579 Před 3 měsíci

    This was good, but a counter is a very simple example.
    What we have a string data type like imagine collaborative editing like in google docs
    initially eh the string was "apple"
    User A changed this string from "apple" to "orange"
    User B changed this string from "apple" to "mango"
    Now I imagine there will be two diffs like { "user-A": "orange"} and {"user-B": "mango" }
    How do we resolve this conflict do we just take the last updated diff?

    • @arpanghoshal2579
      @arpanghoshal2579 Před 3 měsíci

      Okay got a nice answer from chat gpt, sharing below
      How It Works
      Let's consider how an RGA CRDT might handle the example:
      Initial State: Each character in "apple" has a unique identifier.
      a1, p2, p3, l4, e5
      User A's change ("orange"):
      Deletes all characters: a1, p2, p3, l4, e5
      Inserts new characters with new identifiers:
      o6, r7, a8, n9, g10, e11
      User B's change ("mango"):
      Deletes all characters: a1, p2, p3, l4, e5
      Inserts new characters with new identifiers:
      m12, a13, n14, g15, o16
      Merging Changes
      In CRDTs, changes from all users are merged in a way that ensures consistency:
      RGA Merging:
      When User A and User B's changes are merged, the CRDT ensures both sets of operations are applied.
      Depending on the strategy (e.g., causal ordering), one user's operations might be applied before or after the other.
      Example Resolution
      Let's consider a simple resolution where both operations are applied in the order they were received:
      Initial State:
      a1, p2, p3, l4, e5
      Apply User A's changes:
      Deletes: a1, p2, p3, l4, e5
      Inserts: o6, r7, a8, n9, g10, e11
      Result: orange (o6, r7, a8, n9, g10, e11)
      Apply User B's changes:
      Deletes: a1, p2, p3, l4, e5 (no effect since already deleted)
      Inserts: m12, a13, n14, g15, o16
      Result: orangemango (o6, r7, a8, n9, g10, e11, m12, a13, n14, g15, o16)
      Final State
      After merging:
      The final string could be a concatenation like orangemango, or based on the chosen strategy, it could interleave characters or even prefer one user's changes if there is a deterministic rule in place.

  • @lizard450
    @lizard450 Před rokem +1

    You could just publish an "increment" of -1 and all the values would be eventually consistent.

    • @lizard450
      @lizard450 Před rokem +1

      To clarify this. In your example you seem to be having each "node" have a uuid associated with it. In my solution whenever a node wants to mutate the data it would publish an event with a uuid (what you're calling a map).
      For simplicity sake let's just say you have 5 nodes. For a node to mutate the data it would publish event {a1ec...: 1} If that same node then wanted to increment again it would publish {b23f... : 1} each map or event created has it's own uuid and value. So if node 3 wanted to publish an event {c356... : -1} ... after all the nodes received the event your end result would be 1 shared across all nodes. When a node receives an event for the first time process the event and ack back to the sender. The sender will then known not to send that even to that node again. When a node receives an event it already has it will see that the event is already in the hashset and not process the event, but still respond back ack to the sender.
      When a new node comes online it will receive all the events and play the hashset so to speak and get to the correct value.

    • @lizard450
      @lizard450 Před rokem

      Now an advantage to your solution is that it requires less data and it's more efficient.
      That being said... it does come with some compromises. First each node has to know and be able to reach every other node. So if node 1 can't talk to node 5 then node 5 will not have accurate data. You can fix that with a distributed distribution but then you're losing the efficiency.

    • @shreyas.kulkarni
      @shreyas.kulkarni Před měsícem

      layman view: max(3,-1) = 3. only one value per replica in the map

  • @siyaram2855
    @siyaram2855 Před 2 lety

    WE beg you to make courses 🙏🙏 Please come back to teaching