Design file-sharing system like Google Drive / Dropbox (System design interview with EM)

Sdílet
Vložit
  • čas přidán 11. 06. 2024
  • Today's system design mock interview: "Design Google Drive."
    Candidate: Alex, engineering manager (ex-Shopify) and now a coach on our platform.
    Book a session with Alex here: igotanoffer.com/?Y...
    Chapters:
    00:00 Intro
    00:36 Question "Design a file-sharing system like Dropbox, Google Drive, etc"
    00:55 1. Clarifications and requirements
    09:51 2. High-level design (components)
    13:08 2. High-level design (APIs)
    19:20 3. Drill-down (client responsibilities)
    22:09 3. Drill-down (schema)
    29:41 3. Drill-down (upload flow)
    33:53 3. Drill-down (download flow)
    36:24 4. Refinements (regionality)
    37:45 4. Refinements (S3)
    39:22 4. Refinements (CDN)
    40:13 4. Refinements (versioning)
    41:00 4. Refinements (encryption)
    41:45 4. Refinements (database)
    42:42 5. Follow-up (read vs write)
    44:07 5. Follow-up (folders)
    47:35 6. Outro
    About us:
    IGotAnOffer is the leading career coaching marketplace ambitious professionals turn to for help at high-stakes moments in their career. Get a job, negotiate your salary, get a promotion, plan your next career steps - we've got you covered whenever you need us.
    Come and find us: igotanoffer.com/?Y...

Komentáře • 55

  • @IGotAnOffer-Engineering
    @IGotAnOffer-Engineering  Před 5 měsíci

    Get 1-on-1 coaching to ace your system design interview: igotanoffer.com/en/interview-coaching/type/system-design-interview?CZcams&

  • @evgenirusev818
    @evgenirusev818 Před 4 měsíci +35

    I think that this is the best IGotAnOffer video so far. Please bring in Alex for another one - perhaps to design Google maps? Thanks.

  • @scottlim5597
    @scottlim5597 Před měsícem +1

    This candidate has real life experience and it shows in the interview. He starts out simple and build on top of it. I love it.

  • @lakshminarayanannandakumar1286
    @lakshminarayanannandakumar1286 Před 4 měsíci +20

    @31:00 I believe that the rationale for adopting SQL to maintain consistency and reference the CAP theorem is misleading. It's important to note that the concept of consistency in the CAP theorem differs significantly from the consistency defined in ACID. Consistency in CAP means "every read should see the latest write" as opposed to consistency in ACID which ensures that your DB always transits from one valid state to another valid state.

    • @dmitriyobidin6049
      @dmitriyobidin6049 Před měsícem +1

      Yea, you are right. Consistent follower/read replica DB in terms of ACID can be in non-consistent state in terms of CAP, just because it wasn't yet update from the main/write replica.

    • @user-jz1lx1rv1n
      @user-jz1lx1rv1n Před měsícem

      He meant consistency in the CAP theorem.

  • @arghyaDance
    @arghyaDance Před 4 měsíci +2

    Very useful - Simple, Clear, no hurry, flow is really good

  • @naseredinwesleti300
    @naseredinwesleti300 Před 4 měsíci +3

    im genuinely happy to discover this beautiful channel this was very insightful. thank you and keep sharing.

  • @franssjostrom719
    @franssjostrom719 Před 4 měsíci

    Indeed a great video everything from rough calculations to being communicative with the customer was great 🎉

  • @Rajjj7853
    @Rajjj7853 Před 4 měsíci +11

    Good Video! I think there is a mis-calculation, the total storage use for 100 million users is around 1,500 Pb, not 1.5pb.

    • @moneychutney
      @moneychutney Před 3 měsíci

      not so important

    • @dimitryzusman6711
      @dimitryzusman6711 Před 2 měsíci +1

      Oh, no. It is uber important. 1.5PB - 1 large storage account. 1.5 EB is totally different scale of algorythms and data storage - @@moneychutney

  • @DevendraLattu
    @DevendraLattu Před 2 měsíci +1

    Great question asked about compression at 20:55 with a well-structured answer.

  • @almirdavletov535
    @almirdavletov535 Před 5 měsíci +2

    Great video, thanks. One thing I'm really missing is some sort of a judgement, what went well, what was not ideal. I see that DB design wasn't really well though out, or maybe it's just me. Sorting such things out as a conclusion to video would be a great value to those who watch these videos!

  • @AkritiBhat
    @AkritiBhat Před měsícem

    Great video. The way he approaches depth shows that he is very strong

  • @prasadhraju
    @prasadhraju Před 3 měsíci +5

    For 100M users, each user has 15 GB storage space, shouldn't the total storage be 1.5 Exa bytes? Explanation: 100,000,000 * 15 / 1000 * 1000 = 1500 PB = 1.5 EB.

    • @31737
      @31737 Před měsícem +1

      yea calculated the same thing, that is why the back of napkin/envelop is dangerous, I skip it all together as if you get it incorrect it is a big fail.. whats obvious is for this system we must scale horizontally and distribute the load, who cares how many servers is required or not its not even the scope of the interview.

  • @aju126
    @aju126 Před 5 měsíci +4

    I think the main requirement of a file-sharing system is how the edits are handled, something like every edit on a file does not sync the whole file across devices and just the data chunk that was edited, without this requirement, its same as any other design with models and data floating around. Overall it was a great design interview, But one question i have across all the design interviews is the math performed in the beginning wrt to no of users, traffic, QPS etc, how is it even used?

  • @vivekengi01
    @vivekengi01 Před 4 měsíci +1

    Its great video but I think we missed 2 important things:
    1) The file permissions were missing while considering schema design which is "must have" for any file sharing system
    2) For very large files how the upload and download can be optimized to save network bandwidth instead of just redirecting to S3.
    Please take these inputs positively and keep sharing such videos.🙂

  • @yacovskiv4369
    @yacovskiv4369 Před měsícem +3

    Is that bit about partitions in S3 accurate? S3 uses a key-based structure where each object is stored with a unique key. The key can include slashes ("/") to create a hierarchy, effectively mimicking a folder structure but there aren't any actual folders in S3; it's all based on the keys you assign to your objects.
    So, what does he mean by splitting into more folders when they become too large?

    • @tameribrahim6869
      @tameribrahim6869 Před měsícem +1

      AWS S3 (and other cloud blob storage services) are flat storage, the console UI shows folders structures but it's just extracted from the file path (/{folder}/..).
      The only think that may suggests adding a DB field/table, is to store & track the available folders per user to improve performance, as listing the folders directly from SDK/API means you fetch all the blobs then extract the folder structure from them!

  • @lalitkargutkar595
    @lalitkargutkar595 Před měsícem

    Great video!!!

  • @RenegadePawn
    @RenegadePawn Před 3 měsíci +7

    Seemed legit to me for the most part, but why would you use a CDN? Unless there are lots of users with certain big files that are the exact same, what benefit would a CDN provide?

    • @MarcVouve
      @MarcVouve Před měsícem

      Since the video mainly used AWS, I'll use cloudfront as an example so I can be specific. In addition to some security benefits, CloudFront operates over AWS's private network globally and which is typically faster/more reliable than public backbone internet.

    • @user-jz1lx1rv1n
      @user-jz1lx1rv1n Před měsícem

      I think CDN can also be location optimized. So, a person living in Brazil can fetch data from a CDN located near to brazil instead of the one located in US, making it more efficient.

  • @brabebhin368
    @brabebhin368 Před 4 měsíci +6

    It is not ok for the client to tell the api server that the upload is done. The client may be unable to tell the api server about the upload being finished due to network outage, so your metadata database will now be out of sync. You will want to keep that logic on the server.

    • @KNukzzz
      @KNukzzz Před 3 měsíci

      Agreed x1000

    • @adityasanthosh702
      @adityasanthosh702 Před měsícem

      And how exactly would that be implemented? I am curious

  • @mrchimrchi1867
    @mrchimrchi1867 Před 4 měsíci +2

    Alex is fantastic in this video. The interviewer looks like he wants no part of being in this video though.

  • @drawingbook-mq6hx
    @drawingbook-mq6hx Před 12 dny

    If the data is compressed in client side , wondering who will divide the data in blocks and send the data in block. Sending block is having advantage to deduplicate . Thought?
    May be make sense chunking in client side it self. Once the compressed chunks are uploaded to storage, meta data DB can be updated.

  • @nobreath181818
    @nobreath181818 Před 3 měsíci

    Maybe the best system design tutorial I've ever seen.

  • @R0hanThakur
    @R0hanThakur Před měsícem +1

    I think adding the Loging Auth on the client side is not recommended for security reasons. One of the points which my Interviewer didn't seem to be happy about....and since that was for the security team I think that cost me losing the offer

  • @jockeycheng8183
    @jockeycheng8183 Před 2 dny

    don't quite understand the workflow, shouldn't the server itself interacts with S3 to put the data in?

  • @yiannig7347
    @yiannig7347 Před 4 měsíci

    Some data inconsistency issues between DB and Queue.

  • @gouravgarg6756
    @gouravgarg6756 Před 2 měsíci +1

    It is not safe to store AWS credentials on the frontend(client/Browser) for direct S3 uploads. We can use multipart s3 API but it has to be done through API server. Correct?

    • @naveenn7935
      @naveenn7935 Před 22 dny

      You will not be storing AWS creds on client machine. He mentioned he will get the signed URL from S3 via API server which will have TTL.

  • @oleksandrbaranov9649
    @oleksandrbaranov9649 Před 2 měsíci

    always have an impression that the interviewer is trying really hard not to fall asleep 😂

  • @PetarPetrovic-qw9zj
    @PetarPetrovic-qw9zj Před 2 měsíci

    where is the cache on client side?

  • @pixusru
    @pixusru Před měsícem

    There is no database on the high-level diagram. Then, in drill-down, Alex jumps to designing DB schema.

  • @AzharAhmedAbubacker
    @AzharAhmedAbubacker Před 4 měsíci

    Which app is that to used to draw the diagram?

    • @massaad
      @massaad Před 4 měsíci

      It's free version of Figma, its a called a "fig jam" I think. Very fun to use!

  • @sinchanamalage7249
    @sinchanamalage7249 Před 2 měsíci +1

    What’s the drawing tool used in this video?

  • @user-jz1lx1rv1n
    @user-jz1lx1rv1n Před měsícem

    For 100 million users, shoul it not be 1.5 exabytes?

  • @31737
    @31737 Před měsícem

    I believe the math is incorrect you must take 100M users * 15 GB to get to the total which is 1,500 PB

  • @webdevinterview
    @webdevinterview Před měsícem

    Couple of notes:
    - back of the envelope calculations were not utilised at all
    - notifications part was not covered at all
    - the interviewee is pretty much into AWS stuff and goes too much about its specifics
    - upload endpoint should return file id plus some additional metadata about file, probably also 201 response
    - also API design didn't really correspond to what he told in the end. If we upload directly to S3 the flow should be following:
    a) client calls API server to get an upload link
    b) client uploads file directly to S3 using link
    c) once upload is finished, client gets some file id which it sends to another API server endpoint to record that file was actually uploaded. Or maybe there is a way how S3 can itself notify API server, idk about that.
    - there should be an endpoint get info about all the files in the cloud
    - compression on web or mobile is probably not a great idea, compressing 10gb file will eat ton of battery
    Overall, i'd say this system design lack quite some depths.

    • @adityasanthosh702
      @adityasanthosh702 Před měsícem

      10GB won't be compressed as an entirety. It will be split into chunks and each client process/thread parallely compresses it.
      Regarding notifying the server when a client has finished downloading/uploading, I do not think that info needs to be saved by the server. The server's job is to store the latest files and provide links when clients request them. If the file download from client is unsuccessful, the server can assume that its the responsbility of clients to request new changes.
      Same with conflict resolution. Instead of server resolving them, it can ask clients "hey some other client changed the same file. Which one is the latest?"

  • @RajdeepBiswasInd
    @RajdeepBiswasInd Před 2 měsíci +1

    I think you missed out on synchronization.

    • @naveenn7935
      @naveenn7935 Před 22 dny

      I think notification part does the sync part?

  • @kaiwu191
    @kaiwu191 Před 24 dny

    This guy is definitely an experienced engineer, but he didn’t prepare for this kind of interview very well, maybe he is a little bit nervous during the interview. He is trying to say a lot terms like s3 to make him sounds professional but lost many details on how to design the solution from scratch like handling big files. This is a question about designing Dropbox, not use case of s3. The performance would be rejected for any senior positions.

  • @chessmaster856
    @chessmaster856 Před měsícem

    Get billion millionaire cusytomerd.
    Dont guess the capacity. That amazon babies talk. Did you guess the capacity??

  • @SamSarwat90
    @SamSarwat90 Před 3 měsíci

    The interviewers blinking is fkn insane. Dude has issues