12: Design Google Docs/Real Time Text Editor | Systems Design Interview Questions With Ex-Google SWE
Vložit
- čas přidán 16. 02. 2024
- I swear Kate Upton and Megan Fox wrote I was handsome and sexy, you guys just didn't use two phase commit for your document snapshots and version vectors so you never received those writes on your local copy (since your version vector was more up to date than the document snapshot)!
- Věda a technologie
Hey Jordan, just wanted to drop a huge thank you for your system design videos! They were crucial in helping me land an E4 offer at Facebook Singapore (I did product architecture instead of system design). Really appreciate the knowledge and insights you've shared. Cheers!
That's amazing man! Congrats, your hard work paid off!
Are you able to give me some guidance on what to expect and the aspects that you felt was important to cover for the Product Architecture interview? I have one coming up and I'm at a complete lost as to where they'll steer the conversations.
@@hl4113 1. Contact your recruiter for a detailed outline of the interview's structure, focusing on timelines and key areas.
2. Use Jordan's channel and Grokking the API design course for preparation
All the best!
this is production level detail - definitely requires a second sweep to memorize better!
Excellent detailed coverage of online text editor. And you made it easy to understand the concepts.
dang this guy is really good! Thanks for making the video!
writing an ot has operationally transformed my free time into wasted free time
Sounds like a solid operation to me!
Thank you for this video! Pretty cool
Jordan ! Great video as always 🎉.
I have a question , have you considered expanding into maybe dissecting an open source product in a video explaining why certain design decisions were made & discuss maybe how you would alternatively try to solve them ? Once again love all the work you put in, this is GOLD. Thanks !
That's an interesting idea! To tell you the truth, while I'm curious about doing this, the truth is that the amount of time that I'd probably have to put into looking into those codebases would be pretty wild haha.
Not to mention that the guys working on open source software are a lot more talented than me!
🙇 interesting concepts covered. Thank you
Hands down the most in-depth coverage of the topic!
One question that I had - is MYSQL a good choice for write db considering that they will be write-heavy?
Well, maybe not, just since I wonder how good of a write throughput we can get with an acid database using b trees, that being said I'm sure it's fine realistically
Thank you amazing video 🎉
Huge help in landing L4 at Netflix. Much thanks!
Legend!! Congrats :)
Beautiful!!!
Got an offer from LinkedIn. Your videos were great help in system design interview ❤.
Legend!! Congratulations!
The level of detail in this video makes me want to burn all those stupid superficial bs i have been reading all these years. Imma name my 3rd kid after your channel dude ;).... the 2nd one is gotta be martin tho
"Jordan has no life" is a great name for a kid, I endorse
great video and the information at 07:54 (Fortunatly there are engineers who has no life ... 😂😂) made the practicle touch
Hi Jordan! Just watching the CRDT part of the video where you mention giving fractional ids to the characters, between 0 and 1. I was wondering how/at what point these ids are assigned. For instance, if you create a blank document and start typing, what would it look like? And if you then add a few paragraphs at the end, how would these new indexes be assigned? The example you gave (and that I've seen in other places) treat it as an already existing document with already assigned indexes and you just inserting stuff in between.
I was thinking it might be a session thing - i.e. the first user that opens a connection to the file gets these assigned and stores in memory or something, but I watched another video where you mention it being indexed in a database. I'd love to know!
I think I understood in the end, maybe? indexes 0 and 1 don't actually exist - your first character will be around 0.5, second character around 0.75, and so on... you're only going to get indexes < 0.4 if you go back on the text and add characters before the first character you added. If you write without stopping or going back, you'll get 0.5, 0.75, 0.875, 0.9365 and so on?
Hey! I think this is probably implementation dependent, but I imagine the idea here is that there's some frontend logic to batch quick keystrokes together so that they're all assigned similar indices as opposed to constantly bisecting the outer characters (see the BIRD and CAT) example.
Hey Jordan, is there anyway you can make some content regarding how to tackle product architecture interview? I have one from meta coming up and couldnt find many sources of examples for content more focused on API design, client server interactions, extendibility, etc...? There are no examples I can find related to this on youtube. Thank you for all your content!
Hey! I've never done this interview myself so perhaps I'm not the most qualified. But considering that I've had multiple people on here say that they've passed meta interviews, I imagine it's pretty similar to systems design.
Hey Jordan, first of all, thnx for the great video! I have a question: can we use event-sourcing design approach instead of CDC? Meaning that using Kafka topics as the main source of truth instead of the writes' DB. We can consume from Kafka and build snapshots DB, and also users can consume from the needed Kafka partition to get the latest document changes. Thus we automatically get an order for writes inside any single partition and have persistence for writes. WDYT?
Absolutely! Keep in mind though that this implies that the single kafka queue becomes a point that all writes need to go through, which we want to avoid. If we do event sourcing with multiple kafka queues and assign ids to each write based on the queue id and the position in the queue, then use the vector resolution logic that I discuss, I think that this would be better!
Thnx, of course I have in mind using separated Kafka partitions for each document (or set of documents), and using topic's offsets to store for using with snapshots. I'm not sure although if we can use the only one topic with multiple partitions for all writes, because if we have too many partitions for one topic it can increase latency. Maybe it's better to somehow split the incoming data and use many topics, to avoid this problem.@@jordanhasnolife5163
Hey Jordan!
You did a great job with this one, thanks for you hard work!
After watching this video and looking at the final design I didn't quite get to which place would a reader connect to receive updates about new changes in the document?
I see that there are arrows to cache, to vectors db, to snapshots db and to write db, but don't see any WS server or something
Could you clarify please?
Leader first gets a snapshot with a version vector from the vectors and snapshot db, and from there subscribes to changes on document servers, applying any subsequent updates
Much appreciate it
I guess Cassandra is a good choice for Snapshot DB since we can use the character position as the clustering key. WDYT?
I think it's an interesting idea, though my thinking was we really want a single leader here so that snapshots are consistent with the entry in the version vector DB
@@jordanhasnolife5163 Would you also use something like s3 to store big docs' snapshots in your system?
Easy peasy
Great video Jordan.
Two questions on final design screen:
1. Write DB sharding: What is the difference between sharding by DocId vs DocId+ServerId?
2. Document Snapshot DB: We are sharding by docID and indexing by docId+character position, is this correct?
1) We now become bottlenecked by a single database node. If we shard by doc and server id each server can write to a close by database.
2) Yep!
Great video! just curious what might be different if this was for a google sheets like product, rather than a document.
Frankly I think you'd have less collisions which probably means you can get away using a single leader node and not be that bottlenecked. If for some reason you did need to do this, you'd basically need a way of combining writes to the same cells, which doesn't really make much sense intuitively. I'd say if you want to do multi leader you should probably at least incorporate a distributed lock so that if two people decide to edit cells at the same time, we conclusively know which one came first.
@@jordanhasnolife5163 Was thinking the same thing, have them write to the same leader, and let the leader's own concurrent write detection decide.
Dayum boi
I’m 30 minutes in. Got the sense each client just gets all these messages from other clients and applies them using some merge function that guarantees the result of applying messages in the order received makes sense - with a little bit greater consistency (via version vectors) for writes from the same client. But I’m wondering - is there any sync point at which all users are guaranteed to see the same version of the document? Because if not clients could just diverge more and more over time…
Yep - no there is not any sync point. If we wanted to, we could occasionally poll the db on an interval to ensure we don't get too out of wack.
I wonder why you would use another db plus two phase commit for the version vector table, instead of using the same db and use transactions instead.
If I have to partition the data for a big document over multiple tables I need a distributed transaction
If we assume all documents can fit on a single database totally agree that's a much better approach
The version vector for a document can exist on the same partition as the documents partition. If we assume a document can only reach megabytes and not gigabytes it's safe to assume a single document can exist on a single partition. Even if a single document has to be chunked, then we can still colocate the version vector for that chunk.
@@joshg7097 Hey Josh, you can co-locate it, but it still becomes a distributed transaction which needs to use 2pc. Also, ideally, we don't have to be too rack aware in our write storing I feel like, because if we were to use something like AWS we don't necessarily have those controls.
I agree with your point though, probably 99.9% of the time a document won't span multiple partitions and in such an event you should store the version vector local to its partition and don't need 2pc.
@@jordanhasnolife5163 I accepted an L5 meta offer a few months, I watched every single one of your videos, huge thanks to the gigachad 😁
19:15 the result of interleaving of "cat" and "bird" should be "bciartd", right?
Ah yeah maybe a typo on my end
@@jordanhasnolife5163 Yeh right. No worries. Great video, thanks man
Thanks for the video Jordan. At czcams.com/video/YCjVIDv0zQY/video.html How does the new client that has no content fetched so far get the content from Snapshot DB directly? What does it ask the Write DB or Document DB at this point?
You go to the snapshot DB, get a snapshot at some time T, and then poll the writes db for writes beyond that snapshot until you're caught up (e.g. incoming writes to the document are the next ones on top of what you already have).