Automerge: a new foundation for collaboration software
Vložit
- čas přidán 27. 11. 2021
- Local-first software is an effort to make collaboration software less dependent on cloud services, and Automerge is an open-source library for realising local-first software. In this talk I explain our motivation for creating Automerge, and map out 7 years worth of research projects that are feeding into this project.
Recording of a talk given at the University of Cambridge SRG Seminars on 25 Nov 2021.
www.cl.cam.ac.uk/research/srg... - Věda a technologie
Great talk, I really enjoyed it!
These are my personal notes about the talk, in case you want to skip around in the video:
- (0:02:44) SPA architectures are very complex and have many layers of
abstraction
- (0:04:40) Read/write latencies are very high because of network/system
roundtrips
- (0:05:30) Optimistic UIs break in case of network errors
- (0:06:42) Proposal: Local-First Software
- (0:07:35) In local-first software, the primary storage is on the devices,
server only relay communication and save backups
- (0:08:33) Data is replicated in the background, non-blockingly
- (0:09:04) This requires only a few layers of abstraction
- (0:11:25) Your data is lost when the cloud service is shut down or if you get
in trouble with the provider
- (0:12:40) Sync services for local-first software can be generic and
interchangeable
- (0:14:04) Long-term preservation of data is only feasible with local-first
software
- (0:14:35) Working offline works by default in local-first software but is very
hard to do in cloud software
- (0:15:01) In cloud software servers are trusted with unencrypted sensitive
data, in local-first software data is end-to-end encrypted during sync
- (0:16:03) In cloud software the server is trusted with data integrity, in
local-first software data integrity can be cryptographically ensured in the
sync protocol
- (0:16:39) In cloud software, users are at the mercy of the service provicer
- In local-first software users have ownership, control, angency and autonomy
over their data
- (0:17:03) Big challenge of local-first software: Merging concurrent edits
correctly
- (0:19:20) Because local-first software has to solve the merging problem, it is
straightforward to implement version control (including branches)
- This could bring version control in many other areas where it would be very
valuable other than in software development (Git)
- (0:22:29) Introduction Automerge (“Git for your app's data”)
- Automerge operates directly on the data model
- (0:25:05) Automerge preservers all changes, guarantees eventual consistency
(makes concurrent operations commutative) and can merge branches
- (0:26:38) Automerge is a CRDT
- (0:26:57) Timeline of Automerge research projects
- (0:28:23) JSON CRDT
- (0:29:02) CRDTs in Isabelle
- (0:29:18) Automerge project
- (0:30:05) Automerge is used in production by the Washington Post homepage
editors
- (0:30:29) Move operation for CRDT trees
- (0:31:29) Local-first principles (Onward! paper)
- (0:31:54) Authenticated snapshots
- (0:32:46) Byzantine eventual consistency
- (0:33:57) End-to-end encryption for CRDTs
- Decentralized authentication, works without a central server
- (0:34:59) Metadata privacy with anonymity networks
- (0:36:17) CRDT for rich-text data (Peritext)
- (0:37:23) WIP stuff
- More asynchronous collaboration workflows
- Cut + paste
- Interleaving freedom
- Access control
- User discovery
- (0:38:34) Philosophical question from the audience about blockchains
- Operations are not totally, but partially ordered with regard to causality
- (0:43:39) Automerge records an operation log
- (0:44:26) Each operation is given an ID and the causality of operations is
tracked by an “overwrites” field
- (0:45:44) In case of conflicts one change is arbitrarily picked over the other
by default, but the complete conflict can also be retrieved
- (0:46:35) Automerge can represent JSON, sets/tables, text, counters, date/time
and cursors
- (0:47:31) Some skipped slides
- Collaboration latency
- Functional reactive programming
- Network Topologies
- (0:47:40) There is a JavaScript and Rust implementation of Automerge, the Rust
implementation is the basis for other language bindings: WebAssembly, Python,
Swift
- Automerge itself is just a data structure library and implements no disk
persistence or networking
- (0:48:50) In real-time collaboration the log contains many small operations
that can be compressed into snapshots to reduce log size
- Automerge employs sophisticated compression that can arrive at 0.8 Bytes per
operation on a real-world document editing history recording every single
keystroke
- (0:50:31) This is achieved with ideas from columnar databases
- (0:53:51) Skipped over conclusion slide
- (0:55:39) Building on top of Matrix.org as a backend has come up as an idea
- (0:56:06) Question: What's next?
- Testing, improving and finalizing the Automerge (API) for 1.0
- Research: Many ideas and projects to work on available
- There are also many security aspects that can be worked at
- (0:58:00) Another blockchain buzzword discussion
I wish i could tip comments, this is very helpful
Thank you.
I should do this with other videos so I could benefit myself and other people
Great talk. I do want to note that blockchain tends towards centralization, not for the reason that Martin suggested, but because:
1) Most transactions happen through an exchange, marketplace, or other service which people rely on for discovery and facilitation of transactions.
2) Power tends to concentrate to few individuals (ie 80% of bitcoins are owned by 10% of users), often allowing them greater influence in what transactions are added to the chain (there are likely exceptions to this, but broadly that is the trend that I am seeing).
I wish the "director's cut" of the talk would be available online.
Very clear, thank you!
I'm kind of disappointed that the guy butted in to correct the speaker about crypto, while the one woman audience member was trying to ask a question. Crypto is inherently uninteresting in this space, I was wondering what she was going to say, and she didn't actually get a chance.
My biggest concern is the proliferation of CGNAT that still makes relay/central servers extremely necessary. Intuitively, it feels like finding a solution around CGNAT that's not relay/central server dependent would do a lot to make software less dependent on cloud services
A networked application, by definition, need another node to be online to exchange information. Peer to peer network means either both the clients or interest (phone call), or in a store-and-forward architecture the application may be able to send the information to any number of nodes for eventual deliver to the target node (email). It doesn't seem like a significant technical hurdle to replace the server with a peer node in a local-first design. Skype, I believe, went from peer to peer to client/server to improve quality service. If the server is generic, as author was speculating, then the automerge service would become a commodity if successful (55m02s).
Very interesting. Luckily audio got better after half a minute.
Was there a mention of YJS? I must have missed it, if not that’s odd considering it’s a leader in this space.
YJS is an implementation of ideas above, same as automerge and a couple of others which all are based off of works around CRDTs.
Does anyone have a name for the speaker at 1:00:46?
How did he make those hand drawn sides??
Isn't this how email clients work? Will still have a outbox that stores unsent emails.
I think that's just one aspect of offline first. This solution goes further into a potential feature of when editing a single email/doc in async by multiple users. The solution is focusing on merging the changes made by multiple users and less on how the changes are communicated to all users which I think what email ad a spec was focusing on.
did you have to pay the dinosaur from the '70s to beg for his mainframe back? :)
So, basically everything new is a very forgotten old repeated on the new technological level. Quite a bit of software indeed to not benefit much from being purely cloud based and the only point of internet for them is to back up or refresh the data. The problem is none of existing corporations would ever give up user's data, the opportunity to collect and sell it, or show you some ads.
Hey Martin. Can automerge support full text search over a very large database (like Wikipedia) without the user having to locally cache the entirety of data?
*He is legit and reliable hacker💯*
*He is legit and reliable hacker💯*