CRDTs and the Quest for Distributed Consistency

Sdílet
Vložit
  • čas přidán 9. 07. 2024
  • InfoQ Dev Summit Boston, a two-day conference of actionable advice from senior software developers hosted by InfoQ, will take place on June 24-25, 2024 Boston, Massachusetts.
    Deep-dive into 20+ talks from senior software developers over 2 days with parallel breakout sessions. Clarify your immediate dev priorities and get practical advice to make development decisions easier and less risky.
    Register now: bit.ly/47tNEWv
    --------------------------------------------------------------------------------------------------------------------------------------
    Download the slides & audio at InfoQ: bit.ly/2P1IGJe
    Martin Kleppmann explores how to ensure data consistency in distributed systems, especially in systems that don't have an authoritative leader. He explains how to sync data between a phone and a laptop without sending it via a remote server. He also explores algorithms that allow several people to collaborate on a shared document, communicating via a peer-to-peer network.
    This presentation was recorded at QCon London 2018: bit.ly/2hxsoN1
    #SoftwareArchitecture #DistributedSystems #CRDT #Consistency #InfoQ #QCon #QConLondon
  • Věda a technologie

Komentáře • 45

  • @MarioGonzalez-lo4jk
    @MarioGonzalez-lo4jk Před 4 lety +44

    Excellent speaker. No snarkiness, no cheesy clip-art. Slowly eases more deeply into the subject, in a way where you never feel like they skipped a step and are now lost

  • @ajaymenon0
    @ajaymenon0 Před rokem +2

    This has to be the most concise talk I've seen in a long time.
    Probably hit a new benchmark for how crisp talks can be without being overbearing.

  • @uybabayun
    @uybabayun Před 5 lety +34

    That book is just awesome! Highly recommended if anyone is interested...

  • @bdwizard
    @bdwizard Před 5 lety +27

    Superb speaker! An excellent author as well.

  • @xianggu8898
    @xianggu8898 Před 2 lety +10

    My understanding of the automerge algorithm:
    1. Treat the whole document as a list of characters and give each character a unique identifier that is totally ordered (think: timestamp each event with (event's logical clock, process id) gives a total order among all events)
    2. Record each edit and send them to the other concurrently editing user (e.g. Msg={insert character "a" with id 4a after existing character 2a}).
    3. Each user applies the all the local and incoming/external edits. When a conflict arises, reply on the total order to make a "sensible" resolution (e.g. in case of inserting, "smaller" inserts takes precedence over "larger" inserts, so that both users end up in the same string).
    It seems to me this algorithm is a special case of CRDT (on the `insert` method of a `list` date structure so that concurrent `insert`s are ensured identical state on the `list` on both replicas). In general, we can do this to any method on any data structure, and we want the implementations to be *commutative*, which conveniently ensures "ending up in the same state after applying all operations, even if ops are applied in different order".

  • @KalyanSP
    @KalyanSP Před 2 lety +3

    This dude is a freaking legend. The book is phenomenal. Complex topics explained so well

  • @snowy0110
    @snowy0110 Před 4 lety +11

    Such a breath of fresh air after spending tons of time watching marketing-related talks about "new technology XYZ". Thanks! That's awesome!

  • @franciscolopezsancho
    @franciscolopezsancho Před 4 lety +7

    What a gem the content, and the delivery. Enlightening. Thanks!!

  • @ShortGiant1
    @ShortGiant1 Před 5 lety +17

    What a great talk! Such an interesting topic explained in an easy to understand manner. Also, great slides!

  • @DanPiponi
    @DanPiponi Před rokem +2

    The punch line is 40:28 to 42:28. Very clever.

  • @paveltyk
    @paveltyk Před 2 lety +4

    He talks faster than I can think! Great explanation. The book is also brilliant. Had to read it multiple times to get in all details :)

    • @2tce
      @2tce Před rokem

      I thought I was the only one. 😄

  • @TJ-hs1qm
    @TJ-hs1qm Před rokem +1

    I can see how language models could be used to resolve conflicts in a "natural way". "Hi mom dad!" would become "Hey mom and dad!" or in "Hey everyone folks" it would understand the redundancy and automatically suggest sensibel alternatives.

  • @anokhkishore
    @anokhkishore Před 3 lety +3

    Mr Kleppman is awesome

  • @jehan60188
    @jehan60188 Před rokem

    thanks for explaining things clearly! Espeically "push is the JS operation for updating an array" (around 28:00). Makes it easier for people unfamiliar with a particular language to understand!

  • @user-fs5mc4tl3r
    @user-fs5mc4tl3r Před rokem

    Excellent talk from an excellent speaker.

  • @poe84it
    @poe84it Před 5 lety +4

    Really nice talk!

  • @draakisback
    @draakisback Před rokem

    Pretty nice rundown of crdts. I've been using them in non document based systems. For example, an orderbook. In an ordebook, you have very specific operations; and they map very nicely onto the crdt ops and types. My system is also running a hybrid consensus/collaberative algorithm because there are cases where the book wants to reject certain changes but it scales incredibly well.

  • @AmitKumar-we8dm
    @AmitKumar-we8dm Před 3 lety

    Great.. Thanks !

  • @aleksg2925
    @aleksg2925 Před 4 měsíci

    Fantastic book, and brilliant man 👌

  • @angad2364
    @angad2364 Před 2 lety

    Excellent !!

  • @k.k.gayansanjeewa7432
    @k.k.gayansanjeewa7432 Před 7 měsíci

    simply thanks

  • @ibgib
    @ibgib Před 2 lety

    I've have greatly enjoyed this speaker's previous talks on append only logs and others. This talk was fabulous in that it was thought provoking, charismatic and he's out their on stage doin his thing... However I have a few issues regarding many points claimed. Probably the biggest point is CRDTs vs consensus. If one is talking about _bitcoin_ and its hard-coded consensus, then yes, they are only superficially similar. But consensus in general is an algorithm for choosing the next state (not just the next block). The last section about automerge basically delineates their naive (meant literally not pejoratively) conflict resolution for edge cases. But their algorithm essentially is another hard coded consensus with implied limitations on state shapes and the transformations allowed as a tradeoff for the mathematical backing of the expected deterministic outcomes.
    Still a very good talk that is helping me get where I need to go, so a big Thank You!

  • @ArchonLicht
    @ArchonLicht Před 9 měsíci

    I was waiting for an example of "User One changes property X of entity A, while user Two deletes entity A completely" - but alas...

  • @dewijones92
    @dewijones92 Před 2 lety

    brilliant speaker

  • @sobanya_228
    @sobanya_228 Před 5 lety

    Is there any kind of immutable api for Automerge?

  • @walterlol
    @walterlol Před 3 lety +4

    Is this Tom Scott dev version?

    • @walterlol
      @walterlol Před 3 lety +1

      Also, this dude seems cool to work with.

  • @kokizzu
    @kokizzu Před 5 lety

    what happened if current data is both A=1
    User1 delete/expire a record with key A
    and User2 update/upsert a record with key A to value 2

    • @kokizzu
      @kokizzu Před 5 lety +1

      ah ic, 41:00 by assigning priority of the writer

  • @lidu007
    @lidu007 Před 3 lety

    The comments from 10:00 to 12:00 about operational transformation (OT) are out of date. The correctness problem of OT was solved around 2006, which was published in a 2010 JCSCW paper by Li&Li. A family of OT algorithms, called ABT*, were developed by Shao & Li from 2009 to 2011. More details about this line of work are provided in a short essay written in 2011, which can be found on my LinkedIn profile.

    • @TyzFix
      @TyzFix Před 2 lety +1

      can you please share your Linkedin profile?

  • @bengraham3707
    @bengraham3707 Před 3 lety

    So, in a conflict situation the data from the higher node ID always goes first. I propose that the ID should be a GUID to enable nodes to join without the need of any centralized server.
    For your consideration,
    Ben (Node: ffffffff-ffff-ffff-ffff-ffffffffffff)

  • @bryanhaakman
    @bryanhaakman Před 4 lety

    At 32:42: how about using "time" as a way to decide which change to pick? So the change that was made later gets preferred?

    • @tanders12
      @tanders12 Před 4 lety +3

      But according to whose clock?

    • @DaraulHarris
      @DaraulHarris Před 4 lety

      @@tanders12 why not unix time?

    • @snowy0110
      @snowy0110 Před 4 lety

      ​@@DaraulHarris I believe that is subject of debate as the speaker said. Sure thing, you can choose the latest for conflict resolution, but it wouldn't be the perfect strategy for all cases. You never know what's better because it is up to the business rules to decide what is expected behavior. What if the discarded early change was really important? Simply taking the latest wouldn't be a way to go.

    • @totallyupdowns
      @totallyupdowns Před 4 lety +4

      @@DaraulHarris How do you make sure that everyone's clocks are correct? Clock skew is a major problem in distributed systems.

    • @tibs7095
      @tibs7095 Před 4 lety +1

      While this may sound like a good idea, it would introduce unnecessary dependence and, with that, edge cases that could mess everything up. The main benefit about CRDTs is that (finally!) there's no need to worry about such.

  • @hermannschmidt9788
    @hermannschmidt9788 Před 4 lety +3

    Actually, you could store the change log in a blockchain if you want it to be censorship resistant and guaranteed immutable.

    • @rallokkcaz
      @rallokkcaz Před 2 lety

      Immutable, not consistent from each users input. Imagine typing a document with a 15s-15m minute lag, not possible. You've got a good idea, if you don't realize how slow it would actually be. If you know some about BC stuff you also hear him saying that side channels are NOT allowed.

    • @hermannschmidt9788
      @hermannschmidt9788 Před 2 lety

      @@rallokkcaz Someone is replying to my 2y old comment. I am delighted :D I've changed my view on BCs fundamentally in the meantime. They are not good for anything but money.

  • @sammyjankis2783
    @sammyjankis2783 Před 6 dny

    The punch line is 40:28 to 42:28. Very clever.