System Design of Doordash: Geo-Hashing and WebSockets for Location Based Services

Sdílet
Vložit
  • čas přidán 5. 09. 2024

Komentáře • 70

  • @binwelbeck1482
    @binwelbeck1482 Před 2 lety +24

    Jordan Knows what he is taking about ; I like his confidence and the way he explain the scenario. thanks gaurav for inviting Jordan

  • @hardikmenger7360
    @hardikmenger7360 Před 2 lety +6

    This guy was randomly seen on my feed and since then i am following him. His channel is the most no bs system design walkthrough channel.

  • @jackkohnjj
    @jackkohnjj Před rokem +7

    This was a realistic interview in the sense that the interviewer realizes that the scale they are asking to design for is completely insane (500K drivers, 10M orders a day) but still proceeds as a thought experiment. I had an interview recently for a game platform and the interviewer said I should design for 50M concurrent users. LMAO.

  • @modernsimplelife8706
    @modernsimplelife8706 Před rokem +7

    Great video! It's very helpful to see a video where both of the interviewer and the interviewee know what they're talking about in a mock.
    These are some thoughts about WebRTC from my experience. I've been using and implemented WebRTC for 3 years. WebRTC uses UDP under the hood, however it uses SCTP protocol on top of UDP (probably will be replaced by QUIC eventually) to create a TCP-like experience for data channels. The developer is allowed to configure either guaranteed delivery (TCP) or lossy (UDP). WebRTC requires a signaling server to build a root of trust, and connection information for both the user and the driver. WebRTC and Websocket are stateful, meaning the server needs to keep the socket connection open for the entire session, otherwise the clients must reconnect. The stateful nature makes the solution harder to scale, and more complicated.
    As you guys mentioned, most users likely don't track on their food delivery as often as they would with Uber. I think WebRTC is overkill in this use case. The WebRTC connection likely gets broken (from closing the app) many times in the food delivery tracking session, and cause load churns to the server. I personally prefer doing HTTP polling with HTTP/3 transport (which is also based on UDP and support 0-RTT connection establishment). I think it is the simplest solution with reasonable low load to the server and easier to load balance horizontally.
    But, at the end of the day, we should look at metrics to decide the best solution for the problem. Strategically, starting with the simplest solution to quickly gather important metrics would be a good idea. So, I would start with HTTP polling, collect the data, then re-evaluate if WebRTC is needed. Finally, collect the data between HTTP polling and WebRTC. HTTP polling might be the most optimal solution already, simpler solutions can do wonder sometimes 😀. There are also ways to reduce states in WebRTC.

  • @geekydanish5990
    @geekydanish5990 Před 2 lety +22

    I have been watching Jordan’s content for quite a while now and dude you are just killing it thanks Gaurav for inviting him

  • @givemeabreakbro
    @givemeabreakbro Před rokem +8

    Jordan knows what he's talking about. Every answer sounds calculated and well reasoned. Gaurav + Jordan is a killer combo. Looking forward to more such collabs

  • @TheHp6515b
    @TheHp6515b Před 2 lety +12

    Great content. Jordan is very articulate and able to seamlessly translate theoretical knowledge to concrete solution.
    Quick note, consistently hashed cluster is great when keys evenly distribute across the hash space. In this case if we use `geo-hash` directly as the key, we get uneven distribution since small number of prefixes will have most of the data (e.g. geo-hashes representing Manhattan, Seattle etc) i.e. small number of keys having larger data.
    We could use a more dynamic range based sharding (involves a seperate manager service etc) to split key ranges based on load. Alternatively, we can hash the geo-hash itself to get even spread but loose colocation of relatively closer grids which could be important as we zoom-out for finding drivers.
    Either works but worth discussing the tradeoff especially considering latency is critical. (fan-out vs fan-in). Also wasn't clear what approach was proposed.
    Also, Jordan touched about point related to use of pure peer to peer communication for tracking (user and driver directly exchange communicate).
    Few reasons why this isn't practical generally,
    1> Security & Firewalls. Exposed open ports in user's phones are prime target for DOS attacks. Most routers or n/w infra will actually block any new external incoming connection by default and will require a user override to whitelist/forward port. If it's a public router etc there won't be anyway at all to receive connection requests.
    2> Relaying through a central service gives more reliable (in terms of throughput) network path (aws/gcp have interconnects with major isps and dedicated n/w infra), whereas for direct connections require to go through congested public interconnects. This is less of a problem when people are in close geo proximity so insignificant in this case, however relevant for whatsapp etc.
    Understandably not every aspect can be discussed in detail and interviewee isn't even expected to be aware of every aspect.
    Hence FAANG(at least F,G,U) gives interviewee leeway in picking one aspect of their choice for deep dive besides getting overall high level design.
    Side note. It is ok to say i don't know internals of x or y (say websockets or loadbalancers or storage engine etc) and move the interview to what you know. i.e. play by your strengths, not knowing deeply about few things is expected and doesn't result in -ve points but vaguely/incorrectly answering something will lead to -ve point and more followups around the same thing eventually leading to botched interview.

  • @payalgupta3487
    @payalgupta3487 Před 2 lety +4

    Digging deeper into single component is such a good idea. Rightly said that while discussing complex systems on the whole we tend to oversimplify. Enjoyed the video thoroughly. Thanks.

  • @ayonkar1534
    @ayonkar1534 Před 2 lety +2

    This is one of the best videos I have come across for SD Excellent interview candidate.

  • @sachin_yt
    @sachin_yt Před 2 lety +11

    It's amazing watching both of you! Learned a lot! Thanks Gaurav! Def need more of these :)

  • @hitmusicworldwide
    @hitmusicworldwide Před 2 lety +7

    In the USA geohash fixed addresses by zip code+4. The post office has done all the work. A more sophisticated analysis of verified address locations means that you aren't sharding for a lake, or other uninhabited area, saving tables, and flagging possible false orders and incorrect data.

  • @suhani7137
    @suhani7137 Před 2 lety +12

    hi gaurav! thank you for the amazingg content you are providing. i have binge watched your entire content in the past few months.on point explanation...your teaching skills are really goood. just a suggestion if you may...you should collaborate with people who amplify your reach ...like aman dhattarwal..or maybe launch some live courses on their platform...his reach is MASSIVE..it will also help you increase sales for your startup
    Idk if the suggestion i gave is even possible😅but just felt like this precious content and your courses should be reaching more and more people!

  • @SoumikPradhan
    @SoumikPradhan Před 2 lety +3

    This collab was worth the wait. Nice discussion.

  • @michahubert1179
    @michahubert1179 Před 4 měsíci

    Thats a great content ! One thing that happens notoriously without noticing are units. 5Mb is 5 mega bits "b" not bytes "B".

  • @colton8971
    @colton8971 Před 2 lety +1

    Jordan is a funny dude and not ugly!

  • @MrMAhmed22
    @MrMAhmed22 Před rokem

    Very nice effort, felt like gaurav missing some stuff on geo sharing and asking off questions was a bit confusing for the interviewee

  • @user-pp6fi2bt4w
    @user-pp6fi2bt4w Před 5 měsíci

    excellent explanation about geoshashing, thanks guys!

  • @secondIncomePartTime
    @secondIncomePartTime Před rokem +1

    jordan is genius

  • @okidokiyowyow356
    @okidokiyowyow356 Před 2 lety +1

    Hahaha. The introduction is really funny

  • @RahulDasAdhikary-usa
    @RahulDasAdhikary-usa Před rokem

    Nice work! First example is nice - Sri Lanka to India Food delivery 😀 ha ha.. Anyways, I like your channel Gaurav.😊

  • @tanveer.shaikh
    @tanveer.shaikh Před 2 lety +1

    hi Gaurav, for every video can you also please mention the pre-requisite concepts , i felt few of the things going over head

  • @devam9999
    @devam9999 Před 2 lety +1

    Would have been great to see details about the DB/tables design, and then going for the data storage estimations based on that. Is that something generally expected, or is the high level data storage estimation without getting into details about schema design, as done by Jordan here, also acceptable in an interview?

  • @sanketsardesai1961
    @sanketsardesai1961 Před rokem

    I would have used PostGIS definitely, store the data in geometry and use all the spatial functions, indexing and what not

  • @ayushraj6525
    @ayushraj6525 Před 2 lety

    Learned a lot from this collab.. Thanks a lot Gaurav! 🔥

  • @nagendrs
    @nagendrs Před 19 dny

    somewhere after 30 mins, the discussion went stray... first 30 mins too good... but after i felt the discussion was deviated

  • @ankitnmnaik229
    @ankitnmnaik229 Před 2 lety +2

    Bhaiya, i just passed clas 12 th..and i want to get into programming (ml/Ai) but i have never been good at problem solving..so how can I improve my problem solving from now on...can u suggest ways..pls..

  • @imranshaikh115
    @imranshaikh115 Před rokem

    Very nice 👌
    I appreciate 🙏 💛 your efforts

  • @kaustabpaul4514
    @kaustabpaul4514 Před 5 měsíci

    thanks for sharing it was awesome

  • @KiritiSai93
    @KiritiSai93 Před 4 měsíci

    The comment about geohash being ordered and doing binary search is a bit flawed. Remember that geohash has this property:
    'abc' is ALWAYS closer to 'abd', but 'def' for example might actually be also closer to 'abc' as well. So closer in edit distance is a sufficient condition but not a necessary one.
    Read System Design book by Alex Xu for more info on how this could happen.

    • @gkcs
      @gkcs  Před 4 měsíci +1

      czcams.com/video/OcUKFIjhKu0/video.html

    • @KiritiSai93
      @KiritiSai93 Před 4 měsíci

      Thank you! Looks like you've already covered in the topic :)

  • @deepak655655
    @deepak655655 Před 3 měsíci

    Gaurav talking about UDP and his voice went berserk, talking about coincidence !!!

  • @apoorvchaturvedi2493
    @apoorvchaturvedi2493 Před 2 lety

    Thanks Gaurav for this great video. Just one question as redis is being used here as geo-sharded database how concurrency will be handled with redis.

  • @nishantgarg2815
    @nishantgarg2815 Před 2 lety

    Nice one. Learnt a lot. Keep up the good work

  • @szyulian
    @szyulian Před 9 dny

    Watched. 😀-

  • @joaoluismoraes7215
    @joaoluismoraes7215 Před 11 měsíci

    Hi Gaurav, this interview feature by Interview Ready looks nice. How did you guys build it? WebRTC? Have you written any tech deep dive on implementing it?

    • @gkcs
      @gkcs  Před 11 měsíci +2

      We wrote the code in Golang, and accept http requests only.
      My next video is a deep dive on the architecture, stay tuned!

  • @user-pj9ck8ef1j
    @user-pj9ck8ef1j Před 2 lety

    why this video can not be found on gaurav main page home->system design tag?

  • @jackkohnjj
    @jackkohnjj Před rokem

    FYI, lat/lng to geohash is simple arithmetic. It would be ridiculous to call a service to do that.

  • @fabioschubert
    @fabioschubert Před rokem

    Lots of good stuff here but should've at the very least *mentioned* storing the Restaurant's data, menus, building the order, etc.

  • @abhineetsingh6720
    @abhineetsingh6720 Před rokem

    great video

  • @kamalsmusic
    @kamalsmusic Před 2 lety

    Why do we need to do range queries within the redis instance in order to find dashers? If each shard of redis handles a range of geohashes, then can each redis shard just have a hashmap which maps a geohash -> [list of dasher ids]? You can use consistent hashing to locate the shard, then just do a hashmap lookup within that shard.
    As long as given a geohash, I can calculate the 8 surrounding box geohashes, this should be sufficient right? I think most libraries for geohash can calculate this quickly

    • @jordanhasnolife5163
      @jordanhasnolife5163 Před 2 lety +1

      I think it really just depends on the size of the geoshard that you use. If they are big, then you'll have to do a range query to quickly find a nearby driver to a given geohash. If they are the perfect size, then you could just select any of the dashers located in that node.

  • @protyaybanerjee5051
    @protyaybanerjee5051 Před 2 lety +1

    Interview Ready UX could do with some improvement

    • @gkcs
      @gkcs  Před 2 lety

      Yes. Thanks for the feedback.
      Any suggestions for improving the UX?

  • @joaoluismoraes7215
    @joaoluismoraes7215 Před 11 měsíci

    I've worked at a company that solved a similar problem but at a smaller scale (it had 200k users and more than half weren't active). Instead of Geohashing, they did it with MongoDB Geospatial queries. Why is Redis geohashing better in this case? When does geospatial queries start to become a problem?

    • @mhmdshaaban
      @mhmdshaaban Před 9 měsíci

      MongoDb supports both 2d Index where it uses b-tree and 2dsphere Index where it uses similar approach "geo hasing" I'm curious why do I have to implement that from the get-go and It's already implemented

  • @constantinekhvalin6038

    is it secure to have peer-to-peer connections between dashers and clients?

  • @RedBeardRetroTech
    @RedBeardRetroTech Před 2 lety

    Thanks for the vid

  • @chinmaysharma1867
    @chinmaysharma1867 Před 2 lety

    Do I need to know Web development before study system design ???

  • @VireshGehlawat
    @VireshGehlawat Před rokem

    The content was great, the tool needs to be improved.
    I can see both of them struggling with the interface.
    @Gaurav - As the interviewer in a real world scenario, the interviewer won't really be drawing as much as you were in this case.

  • @poshettiakhil2664
    @poshettiakhil2664 Před 2 lety

    Hi Gaurav,
    This question is not related to doordash system design,
    Please can you explain the difference between message bus ,message queue and message broker

    • @protyaybanerjee5051
      @protyaybanerjee5051 Před 2 lety +1

      Message broker can be a distributed system which provides a reliable and fault tolerant way of delivering messages b/w multiple producer and consumer.
      A message broker can handle many to many mapping between a producer and consumer

  • @nitishkumarsingh6020
    @nitishkumarsingh6020 Před rokem

    Which tool has been used for drawing?

  • @hazemabdelalim5432
    @hazemabdelalim5432 Před rokem

    Isn't MYSQL harder to shard ? i believe no-sql is easier to shard in that case so i would choose it .

    • @gkcs
      @gkcs  Před rokem +1

      It has to be sharded manually, yes.

  • @rkara2
    @rkara2 Před 2 lety

    Why can't you just have all the actors (drivers, customers, businesses) subscribe (websockets) to the system and ping their location (lat/long) in real-time?
    Then when a customer / business needs a driver, they are getting the real-time picture of whos active in the system and where they are.
    So at the end of the day you don't care where drivers / customers are in the world, you just care about their availability.
    So you can have one table called Availability that you match everybody through.

  • @Alex-v4h9z
    @Alex-v4h9z Před 18 dny

    When the interviewers makes your life miserable 😂

  • @tusharabbott
    @tusharabbott Před 2 lety

    now i know why swiggy was not able to find a delivery partner for my order.

  • @mansivishwakarma361
    @mansivishwakarma361 Před 2 lety +1

    💕💕💕💕

  • @sipwhitemocha
    @sipwhitemocha Před rokem

    Aye shoutout to J. Epstein

  • @timurakhalaya6289
    @timurakhalaya6289 Před 5 měsíci

    poor performance i would say, in term of technology picked and in terms of planning as well ...

  • @anirudhsilverking5761

    I think I gained 30 IQ points just listening to this

  • @aparnaprabhu
    @aparnaprabhu Před 2 lety

    I read this as system design of Doordarshan 🤦‍♀️🤦‍♀️