12: Design Google Docs/Real Time Text Editor | Systems Design Interview Questions With Ex-Google SWE

Sdílet
Vložit
  • čas přidán 16. 02. 2024
  • I swear Kate Upton and Megan Fox wrote I was handsome and sexy, you guys just didn't use two phase commit for your document snapshots and version vectors so you never received those writes on your local copy (since your version vector was more up to date than the document snapshot)!
  • Věda a technologie

Komentáře • 58

  • @jiangnan1909
    @jiangnan1909 Před 5 měsíci +13

    Hey Jordan, just wanted to drop a huge thank you for your system design videos! They were crucial in helping me land an E4 offer at Facebook Singapore (I did product architecture instead of system design). Really appreciate the knowledge and insights you've shared. Cheers!

    • @jordanhasnolife5163
      @jordanhasnolife5163  Před 5 měsíci

      That's amazing man! Congrats, your hard work paid off!

    • @hl4113
      @hl4113 Před 5 měsíci

      Are you able to give me some guidance on what to expect and the aspects that you felt was important to cover for the Product Architecture interview? I have one coming up and I'm at a complete lost as to where they'll steer the conversations.

    • @jiangnan1909
      @jiangnan1909 Před 5 měsíci

      ​@@hl4113 1. Contact your recruiter for a detailed outline of the interview's structure, focusing on timelines and key areas.
      2. Use Jordan's channel and Grokking the API design course for preparation
      All the best!

  • @AP-eh6gr
    @AP-eh6gr Před 4 měsíci +7

    this is production level detail - definitely requires a second sweep to memorize better!

  • @venkatadriganesan475
    @venkatadriganesan475 Před měsícem +1

    Excellent detailed coverage of online text editor. And you made it easy to understand the concepts.

  • @renzheng7845
    @renzheng7845 Před 5 měsíci +3

    dang this guy is really good! Thanks for making the video!

  • @gangsterism
    @gangsterism Před 5 měsíci +4

    writing an ot has operationally transformed my free time into wasted free time

  • @user-vz3zp2qg9q
    @user-vz3zp2qg9q Před 3 měsíci +1

    Thank you for this video! Pretty cool

  • @DevGP
    @DevGP Před 5 měsíci +1

    Jordan ! Great video as always 🎉.
    I have a question , have you considered expanding into maybe dissecting an open source product in a video explaining why certain design decisions were made & discuss maybe how you would alternatively try to solve them ? Once again love all the work you put in, this is GOLD. Thanks !

    • @jordanhasnolife5163
      @jordanhasnolife5163  Před 5 měsíci

      That's an interesting idea! To tell you the truth, while I'm curious about doing this, the truth is that the amount of time that I'd probably have to put into looking into those codebases would be pretty wild haha.
      Not to mention that the guys working on open source software are a lot more talented than me!

  • @user-wj1wy6ph5q
    @user-wj1wy6ph5q Před 5 měsíci +1

    🙇 interesting concepts covered. Thank you

  • @soumik76
    @soumik76 Před 5 dny +1

    Hands down the most in-depth coverage of the topic!
    One question that I had - is MYSQL a good choice for write db considering that they will be write-heavy?

    • @jordanhasnolife5163
      @jordanhasnolife5163  Před 5 dny

      Well, maybe not, just since I wonder how good of a write throughput we can get with an acid database using b trees, that being said I'm sure it's fine realistically

  • @sarvagya5817
    @sarvagya5817 Před 5 měsíci +1

    Thank you amazing video 🎉

  • @Crunchymg
    @Crunchymg Před 5 měsíci +2

    Huge help in landing L4 at Netflix. Much thanks!

  • @LeiGao-im7ii
    @LeiGao-im7ii Před 3 měsíci +1

    Beautiful!!!

  • @nowonderwhy200
    @nowonderwhy200 Před 5 měsíci +2

    Got an offer from LinkedIn. Your videos were great help in system design interview ❤.

  • @fluffymattress5242
    @fluffymattress5242 Před 2 měsíci +1

    The level of detail in this video makes me want to burn all those stupid superficial bs i have been reading all these years. Imma name my 3rd kid after your channel dude ;).... the 2nd one is gotta be martin tho

  • @khushalsingh576
    @khushalsingh576 Před měsícem +1

    great video and the information at 07:54 (Fortunatly there are engineers who has no life ... 😂😂) made the practicle touch

  • @user-id1sf2ib3s
    @user-id1sf2ib3s Před 5 měsíci +1

    Hi Jordan! Just watching the CRDT part of the video where you mention giving fractional ids to the characters, between 0 and 1. I was wondering how/at what point these ids are assigned. For instance, if you create a blank document and start typing, what would it look like? And if you then add a few paragraphs at the end, how would these new indexes be assigned? The example you gave (and that I've seen in other places) treat it as an already existing document with already assigned indexes and you just inserting stuff in between.
    I was thinking it might be a session thing - i.e. the first user that opens a connection to the file gets these assigned and stores in memory or something, but I watched another video where you mention it being indexed in a database. I'd love to know!

    • @user-id1sf2ib3s
      @user-id1sf2ib3s Před 5 měsíci +2

      I think I understood in the end, maybe? indexes 0 and 1 don't actually exist - your first character will be around 0.5, second character around 0.75, and so on... you're only going to get indexes < 0.4 if you go back on the text and add characters before the first character you added. If you write without stopping or going back, you'll get 0.5, 0.75, 0.875, 0.9365 and so on?

    • @jordanhasnolife5163
      @jordanhasnolife5163  Před 5 měsíci

      Hey! I think this is probably implementation dependent, but I imagine the idea here is that there's some frontend logic to batch quick keystrokes together so that they're all assigned similar indices as opposed to constantly bisecting the outer characters (see the BIRD and CAT) example.

  • @hl4113
    @hl4113 Před 5 měsíci +1

    Hey Jordan, is there anyway you can make some content regarding how to tackle product architecture interview? I have one from meta coming up and couldnt find many sources of examples for content more focused on API design, client server interactions, extendibility, etc...? There are no examples I can find related to this on youtube. Thank you for all your content!

    • @jordanhasnolife5163
      @jordanhasnolife5163  Před 5 měsíci

      Hey! I've never done this interview myself so perhaps I'm not the most qualified. But considering that I've had multiple people on here say that they've passed meta interviews, I imagine it's pretty similar to systems design.

  • @asyavorobyova2960
    @asyavorobyova2960 Před 4 měsíci +1

    Hey Jordan, first of all, thnx for the great video! I have a question: can we use event-sourcing design approach instead of CDC? Meaning that using Kafka topics as the main source of truth instead of the writes' DB. We can consume from Kafka and build snapshots DB, and also users can consume from the needed Kafka partition to get the latest document changes. Thus we automatically get an order for writes inside any single partition and have persistence for writes. WDYT?

    • @jordanhasnolife5163
      @jordanhasnolife5163  Před 4 měsíci

      Absolutely! Keep in mind though that this implies that the single kafka queue becomes a point that all writes need to go through, which we want to avoid. If we do event sourcing with multiple kafka queues and assign ids to each write based on the queue id and the position in the queue, then use the vector resolution logic that I discuss, I think that this would be better!

    • @asyavorobyova2960
      @asyavorobyova2960 Před 4 měsíci

      Thnx, of course I have in mind using separated Kafka partitions for each document (or set of documents), and using topic's offsets to store for using with snapshots. I'm not sure although if we can use the only one topic with multiple partitions for all writes, because if we have too many partitions for one topic it can increase latency. Maybe it's better to somehow split the incoming data and use many topics, to avoid this problem.@@jordanhasnolife5163

  • @antonvityazev4823
    @antonvityazev4823 Před 2 měsíci +1

    Hey Jordan!
    You did a great job with this one, thanks for you hard work!
    After watching this video and looking at the final design I didn't quite get to which place would a reader connect to receive updates about new changes in the document?
    I see that there are arrows to cache, to vectors db, to snapshots db and to write db, but don't see any WS server or something
    Could you clarify please?

    • @jordanhasnolife5163
      @jordanhasnolife5163  Před 2 měsíci

      Leader first gets a snapshot with a version vector from the vectors and snapshot db, and from there subscribes to changes on document servers, applying any subsequent updates

    • @antonvityazev4823
      @antonvityazev4823 Před měsícem +1

      Much appreciate it

  • @rjarora
    @rjarora Před 15 dny +1

    I guess Cassandra is a good choice for Snapshot DB since we can use the character position as the clustering key. WDYT?

    • @jordanhasnolife5163
      @jordanhasnolife5163  Před 15 dny +1

      I think it's an interesting idea, though my thinking was we really want a single leader here so that snapshots are consistent with the entry in the version vector DB

    • @rjarora
      @rjarora Před 15 dny

      @@jordanhasnolife5163 Would you also use something like s3 to store big docs' snapshots in your system?

  • @evrard90
    @evrard90 Před 18 dny +1

    Easy peasy

  • @priteshacharya
    @priteshacharya Před 2 měsíci +1

    Great video Jordan.
    Two questions on final design screen:
    1. Write DB sharding: What is the difference between sharding by DocId vs DocId+ServerId?
    2. Document Snapshot DB: We are sharding by docID and indexing by docId+character position, is this correct?

    • @jordanhasnolife5163
      @jordanhasnolife5163  Před 2 měsíci

      1) We now become bottlenecked by a single database node. If we shard by doc and server id each server can write to a close by database.
      2) Yep!

  • @levyshi
    @levyshi Před 2 měsíci +1

    Great video! just curious what might be different if this was for a google sheets like product, rather than a document.

    • @jordanhasnolife5163
      @jordanhasnolife5163  Před 2 měsíci

      Frankly I think you'd have less collisions which probably means you can get away using a single leader node and not be that bottlenecked. If for some reason you did need to do this, you'd basically need a way of combining writes to the same cells, which doesn't really make much sense intuitively. I'd say if you want to do multi leader you should probably at least incorporate a distributed lock so that if two people decide to edit cells at the same time, we conclusively know which one came first.

    • @levyshi
      @levyshi Před 2 měsíci +1

      @@jordanhasnolife5163 Was thinking the same thing, have them write to the same leader, and let the leader's own concurrent write detection decide.

  • @RobinHistoryMystery
    @RobinHistoryMystery Před 4 měsíci +1

    Dayum boi

  • @firezdog
    @firezdog Před 3 měsíci +1

    I’m 30 minutes in. Got the sense each client just gets all these messages from other clients and applies them using some merge function that guarantees the result of applying messages in the order received makes sense - with a little bit greater consistency (via version vectors) for writes from the same client. But I’m wondering - is there any sync point at which all users are guaranteed to see the same version of the document? Because if not clients could just diverge more and more over time…

    • @jordanhasnolife5163
      @jordanhasnolife5163  Před 3 měsíci

      Yep - no there is not any sync point. If we wanted to, we could occasionally poll the db on an interval to ensure we don't get too out of wack.

  • @joshg7097
    @joshg7097 Před 5 měsíci +3

    I wonder why you would use another db plus two phase commit for the version vector table, instead of using the same db and use transactions instead.

    • @jordanhasnolife5163
      @jordanhasnolife5163  Před 5 měsíci

      If I have to partition the data for a big document over multiple tables I need a distributed transaction

    • @jordanhasnolife5163
      @jordanhasnolife5163  Před 5 měsíci

      If we assume all documents can fit on a single database totally agree that's a much better approach

    • @joshg7097
      @joshg7097 Před 5 měsíci +1

      The version vector for a document can exist on the same partition as the documents partition. If we assume a document can only reach megabytes and not gigabytes it's safe to assume a single document can exist on a single partition. Even if a single document has to be chunked, then we can still colocate the version vector for that chunk.

    • @jordanhasnolife5163
      @jordanhasnolife5163  Před 5 měsíci +1

      @@joshg7097 Hey Josh, you can co-locate it, but it still becomes a distributed transaction which needs to use 2pc. Also, ideally, we don't have to be too rack aware in our write storing I feel like, because if we were to use something like AWS we don't necessarily have those controls.
      I agree with your point though, probably 99.9% of the time a document won't span multiple partitions and in such an event you should store the version vector local to its partition and don't need 2pc.

    • @joshg7097
      @joshg7097 Před 5 měsíci

      @@jordanhasnolife5163 I accepted an L5 meta offer a few months, I watched every single one of your videos, huge thanks to the gigachad 😁

  • @jasdn93bsad992
    @jasdn93bsad992 Před 3 měsíci +1

    19:15 the result of interleaving of "cat" and "bird" should be "bciartd", right?

  • @ShreeharshaV
    @ShreeharshaV Před 27 dny +1

    Thanks for the video Jordan. At czcams.com/video/YCjVIDv0zQY/video.html How does the new client that has no content fetched so far get the content from Snapshot DB directly? What does it ask the Write DB or Document DB at this point?

    • @jordanhasnolife5163
      @jordanhasnolife5163  Před 26 dny +1

      You go to the snapshot DB, get a snapshot at some time T, and then poll the writes db for writes beyond that snapshot until you're caught up (e.g. incoming writes to the document are the next ones on top of what you already have).