Video není dostupné.
Omlouváme se.

Grokking the Uber System Design Interview - Ride Sharing Service Design | OLA System Design

Sdílet
Vložit
  • čas přidán 17. 08. 2024
  • This is the sysem design video about Uber System Design. In this video we are discussing how to tackle the system design interview question about designing Uber or Lyft ride sharing services. In this video, we will be discussing the micro-services architecture of Uber and will discuss design of several micro-services that Uber is comprises of. E.g.
    1. Map Service
    2. User Service
    3. Routing Service
    4. Driver Location Service
    5. Trip Service etc.
    Instead of skipping the video, you can actually play the video at 1.25x speed and in this way avoid missing any important points. However, if you like to overview of what I am discussing in the video then please check:
    0:00:00 - Introduction
    0:01:00 - Functional Requirements of Uber System Design
    0:06:15 - Non-Functional Requirements of Uber System Design
    0:08:50 - API Specs
    0:16:15 - High-level Architecture of Uber System
    0:18:50 - Design of Map Service
    0:25:50 - Design of User Service
    0:27:35 - Design of Routing Service
    0:37:05 - Design of Driver Location Service
    0:49:50 - Design of Trip Service and very important discussion sharding of secondary indices
    1:01:05 - Final Remarks
    Distributed System Design Interviews Bible | Best online resource for System Design Interview Preparation is now online. Please visit: www.thinksoftw...?
    Please follow me on / think.software.community if you like to get notified about new course chapters getting added or when we will start another round of mock interviews and you want to participate in mock interviews or any other updates. I will also take your suggestions there about the course and the channel.
    Please check my other videos for more information about the following topics:
    1. How to generate a unique id: • System Design Intervie...
    2. Distributed Cache Design: • Distributed Cache Syst...
    3. The right way to tackle the system design interviews: • The Format of Distribu...
    Check out our following articles:
    - How to Ace Object-Oriented Design Interviews: / how-to-ace-object-orie...
    - Elevator System Design - A tricky technical interview question: / elevator-system-design...
    - System Design of URL Shortening Service like TinyURL: / tinyurl-design-from-th...
    - File Sharing Service Like Dropbox Or Google Drive - How To Tackle System Design Interview: / how-to-tackle-system-d...
    - Design Twitter - Microservices Architecture of Twitter Service: / design-twitter-microse...
    - How to Effectively Use Mock Interviews to Prepare for FAANG Software Engineering Interviews: / how-to-effectively-use...
    - Payment Gateway System Design - How does the Stripe work: / payment-gateway-system...
    - Selecting the best database for your service: / selecting-the-best-dat...
    #SystemDesign #DistributedSystems #FAANG #Facebook #Google #Amazon #Apple #Microsoft #Uber #Netflix #Oracle #Lyft #Interview #ComputerProgramming

Komentáře • 217

  • @ThinkSoftware
    @ThinkSoftware  Před 4 lety +19

    Thanks for watching the video.If you need to, you can play the video at 1.25x speed instead of skipping the video and in that case skip some important point. Please let me know in the comments below if you like this video. Thanks.

    • @winning_aadict
      @winning_aadict Před 4 lety

      This is awesome. Thank you sir

    • @motivationmentoring2298
      @motivationmentoring2298 Před 4 lety

      Thank you it is amazing work keep it up

    • @vishalsarda9764
      @vishalsarda9764 Před 3 lety

      Does the map service store some data? If yes, what data and in what format?

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      Map Service stores map data. There are different formats of map data that you can find the details in the course :)

    • @namduong1580
      @namduong1580 Před 3 lety

      Great tip, thank you!!

  • @neosapien247
    @neosapien247 Před 4 lety +7

    This is the best video for this topic. A lot of videos describe how Uber is designed, not how you, the "candidate" would design Uber. They are not looking for your understanding of Uber's real architecture. They want to know how creative and logical YOU are if given the task. This is the perfect tutorial for that.

  • @snehalchavan5796
    @snehalchavan5796 Před 3 lety +3

    The only reason I watched this video was because I wasn't satisfied with the solution in 'grokking the system design interview'.
    I must say that you have done a very good job at explaining the ride share system design and it'll definitely help me in my interviews.

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      Thanks for the comment 🙂. You should also check my course on System Design Interviews Bible then.

  • @TarunPrasadKottary
    @TarunPrasadKottary Před 4 lety +7

    Finally, some fantastic videos on System design!

  • @zvezdalion
    @zvezdalion Před 2 lety +1

    Learning so much about how to think about system design. Your style of teaching really emphasizes that, rather than trying to cram all the specifics of how this or that software was made. Thank you!

  • @Brandonstiles
    @Brandonstiles Před 4 lety +4

    Great video! Good call on pointing out the need to pre-plan analytics and monitoring

  • @binitasinha2695
    @binitasinha2695 Před 4 lety +1

    I have gone through many system design videos but only subscribed to your channel as I never have seen such an informative video ever. Thank you for sharing your knowledge. kudos

  • @bostonlights2749
    @bostonlights2749 Před 4 lety +3

    For those who made it till the end,
    This is quality stuff

  • @tarun29061990
    @tarun29061990 Před 3 lety +1

    Wow!! Such a clear explanation about each and every service. Great work mate :) I have watched lot of videos on Uber system design but your video made it super clear. Great part is you compare the pros and cons of each approach. Please keep doing this in your future videos. Thanks :)

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      Thanks for the comment :) More videos will be coming soon. Till now I was busy in releasing my course.

    • @tarun29061990
      @tarun29061990 Před 3 lety

      Think Software share the course link as well

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      @@tarun29061990 I have my latest video discussing the course with some discount coupons.

  • @shabnamkoosha
    @shabnamkoosha Před 2 lety +4

    By far the best design video I have watched. Very detailed and organized. Thank you.

  • @RY-it3de
    @RY-it3de Před 3 lety +1

    LMAO the shade thrown at the Grokking course about the Driver Location service discussion. I too was confused about the QuadTree / HT suggestion

  • @mohajeramir
    @mohajeramir Před rokem +1

    Really appreciate you doing this. There is a wealth of information here.

  • @deepakchhabra5932
    @deepakchhabra5932 Před 3 lety +1

    To answer your question in the end which Partition approach is preferable for secondary indexes: Local vs Global - then the obvious choice is global because if we want to query by tripId we already got a PK based partition, and when we need trips for driver or passengers then we should have Global index so that we don't do scatter gather, idea is to get to the details quickly. Now it also means my writes to trip table will be slower because of additional global indexes but Trip table is not so time critical like driver location or pricing etc, as long as trip creation is successful we can tolerate little bit of delay here, also, we picked relational DB as a choice for Trips because we want consistency so if we made that decision right then little bit of slowness is acceptable. I'm not sure if at DB level some kind of optimization is possible where secondary indexes could be built async- could be explored. Nevertheless, great content as always, tips you discussed are nice. Amazing work. Thank you.

  • @sivakumaresan9602
    @sivakumaresan9602 Před 3 lety +2

    Excellent video! Thank you for the detailed explanation.
    I do have one comment. At 59:30 you say that the 'Global index' method of maintaining secondary indexes would allow all records belonging to 1 driver to exist within 1 partition. That may not be the case since the data is partitioned based on the primary index, i.e., Trip ID. Hence, there is no way for us to enforce all data belonging to a driver to exist in the same partition.
    What makes 'global index' method different is that it would store the secondary indexes in a separate location (outside the data partitions) that would allow the queries to be routed to the exact partition(s) where the data exists. In other words, the mapping between each driver id and the location within the (Trip ID-based) partitions is stored separately as a Global index. Similarly there will be another global index for passenger ID. These global secondary indexes can be further partitioned based on it's values, i.e., based on Driver ID or Passenger IDs. I guess you were referring to my last point in your video, but it came out different. :) Also, note that 1 driver ID could exist across multiple (Trip ID-based) partitions. The global index for driver ID will let us know what partitions to look for, instead of doing a scatter-gather approach with Local indexes.

  • @kaushalrjain6326
    @kaushalrjain6326 Před 4 lety +2

    Fantastic Explanation. Was waiting for this. Really helpful. Thnx.

  • @manju528
    @manju528 Před 3 lety

    I liked your system design videos! They all are explained in very simple language. I yet to finish watching all of them though. Thanks for making these videos.

  • @brgowd
    @brgowd Před rokem +2

    Very informative

  • @arpitamathur4859
    @arpitamathur4859 Před 3 lety +2

    In the Grokking the system design course, i think it will get the initial positions of the drivers from the quad tree and would then subscribe the customer to those drivers. Hence later it can use the DriverLocation HT to get the updated locations.

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety +1

      There are many loose ends. First of all the main purpose of keeping DHT was that it is slow to update/query the Quadtree. Then it means Quadtree may have stale data. Then secondly how the subscription will work? And then thirdly how to find nearby drivers to send trip requests?

    • @Funsiestype
      @Funsiestype Před 2 lety +1

      Initial set of drivers don’t matter. You can show some random fake cars. What’s the purpose of it other than exposing useless privacy info. You have to ask clarifying questions for requirements. Just because you can doesn’t mean you should.

    • @ThinkSoftware
      @ThinkSoftware  Před 2 lety

      Exactly ^

  • @pujabhandari5471
    @pujabhandari5471 Před rokem +1

    Very informative and interesting video I really like it

  • @npriyanka89shivu
    @npriyanka89shivu Před 3 lety +2

    Amazing explanation. Thank you!

  • @MrManjax
    @MrManjax Před 2 lety +1

    Excellent design video with a number of useful details covered 👌 Thanks for your efforts.
    In regards to your question about partitioning Trip info, I would suggest your approach wherein you use a combination of {Trip id, driver id} to identify the shard/partition. Since it's more likely to query all trips related to a driver, it would enhance the locality of data lookup and have the potential of improving System throughput.
    As regards your other question about handling hot zones in a busy metropolitan area, it would be great to incorporate something like this -
    Define a certain threshold of the number of drivers within a certain area at the passenger's request. If the lookup discovers a greater number of drivers, it should automatically sub-divide that particular cell into a finer granularity (microcell?) in order to find the most optimal driver match to provide a response to the Dispatch service.
    BTW, would it be possible to publish a picture with all the services (along with their sub-services) called out in one place along with the cache/datastore they are tied to?

    • @ThinkSoftware
      @ThinkSoftware  Před 2 lety

      Thanks for the comment 🙂. More details are in the course.

  • @chanm2147
    @chanm2147 Před 7 měsíci +1

    Hello, what is the algorithm you mentioned at 53:28 to find the global optimized solution? Is it K-nearest neighbor?

  • @rng19
    @rng19 Před 2 lety +1

    Thanks a lot. Your videos are very helpful.

  • @AkashJadhavIT
    @AkashJadhavIT Před 4 lety +3

    Thank you for all these amazing videos on systems design. We have learned great concepts from you sir.
    One request from my side. Would it possible for you to make a series on "object oriented design interview". eg. 'Designing the parking lot'.
    Any resources you would recommend for the same?

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety +3

      Thanks for the comment. I didn't find any good single resource to cover all topics and that is why started this channel. Right now you can find the information spread all over internet and even not complete. I have also noted your request and will make video on object oriented design soon.

    • @AkashJadhavIT
      @AkashJadhavIT Před 4 lety +1

      @@ThinkSoftware thanks. looking forward to hear from you

  • @TheRohkan
    @TheRohkan Před 3 lety

    the right amount of depth .. thanks for this

  • @priyavenkatesan1804
    @priyavenkatesan1804 Před 4 lety

    Excellent explanation! One small improvement I would do wrt to driver location service is to use slippy tiles for id calculation. No biggie. Learnt a lot. Thank you very much

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      Thanks for the comment. How would you figure out the tile or grid Id if using slippy tiles? And how would you know the id's of neighboring tiles if using sleepy tiles?

    • @priyavenkatesan1804
      @priyavenkatesan1804 Před 4 lety

      wiki.openstreetmap.org/wiki/Slippy_map_tilenames#Tile_servers.
      Key in the cache will be the tile number and value will be a list of drivers in that tile. Same as you have, except that the key is a number. Does that work?

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      Is there a way to calculate tile Id from location without using some lookup table? Also i think all the tiles should be assumed to be at the same zoom level which is then nothing but equal to what I have suggested?

    • @priyavenkatesan1804
      @priyavenkatesan1804 Před 4 lety +1

      Yes, we can write a small function that returns the tile id based on lat, lon. True, the zoom level will be the same. Like you say .7 * .7 sq m might map to zoom 15 or so. The approach you have is great. The main advantage of this calculation is that you can use a number as the key. May be this helps with partition. I can't tell. The other advantage is offers is in a situation where you have dense operation, you can dynamically recalculate the tiles based on zoom level. Say you decrease the zoom level, then you might have more drivers mapped to that area. The negative to this approach could be the time taken for dynamic recalculation. I haven't determined the cost in this case. However, in your approach you are quering for more neighboring tiles. This is not bad either. Just a different approach.

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety +1

      OK thanks for clarification 🙂

  • @dhiwakarna8509
    @dhiwakarna8509 Před rokem +2

    Highly appreciate your content, learning tons! I have a question around the routing/proxy service. Shouldn't its responsibility be very high level? (like authenticating all requests, rate limiting and routing requests to appropriate services). Here it seems to do very specific things like create a user connection object and maintaining it which is needed for a specific use case (trip matching). can you shed more light on this?

    • @ThinkSoftware
      @ThinkSoftware  Před rokem

      I believe this is discussed in the course. The routing service is nothing but an API gateway. It performs all the functionality of an API gateway. However, not all API gateway support websockets or long pull. In order to support them, the architecture discussed in routing service is required.

    • @dhiwakarna8509
      @dhiwakarna8509 Před rokem

      @@ThinkSoftware That makes sense. In a way we are doing this. In order to maintain a websocket, the request needs to stick to a specific instance of the api gateway and that is where all this book keeping comes in. The api gateways are not stateless here.

  • @dhwanip1000
    @dhwanip1000 Před 2 lety +1

    Thanks for the wonderful video. My question here is what areas to focus on in actual interviews?. Suppose, I am asked this question in the interview, how would you suggest to start and dive deep into each part? Also, not everyone would know everything about these services, in that case, how would you advice a candidate to tackle that situation?

    • @ThinkSoftware
      @ThinkSoftware  Před 2 lety +1

      I have discussed this in some of my other videos and course. You should also look at mock interview videos that I have uploaded in this channel and the course.

  • @navoriion
    @navoriion Před 4 lety +1

    Takes lot of effort & time to make videos. Keep up the good work. Do you read company blogs to understand the architecture of the system ?

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety +1

      Yes making a video usually takes atleast 3-4 weeks.. 2/3 weeks for all the research. 1/2 weeks to come up with my own design and solution for the problem and then 1 week for video production/editing/upload etc.

  • @YegorChumakov
    @YegorChumakov Před 3 lety +1

    Great videos! Thank you. I've subscribed.

  • @KangJangkrik
    @KangJangkrik Před 3 lety

    I would like to do correction for map service. This is simplified steps of finding closest route:
    1. Calculate angle of attack (AoA) from A to B with atan(y/x) * (180/PI) formula
    2. Calculate each reachable points' AoA
    3. Compare it by substracting each of those point's AoA with A-B AoA, results angle delta
    4. Go forward to point with smallest angle delta
    5. If there is no B point on current closest points, then repeat step 1
    In reality, we have to deal geometric equations unless we use Google's Route API which is costs a lot of money. Cheers!

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety +1

      Thanks for the comment 🙂

    • @KangJangkrik
      @KangJangkrik Před 3 lety

      @@ThinkSoftware you're welcome! I found this channel useful, so +1 subscriber for you

  • @privacywanted434
    @privacywanted434 Před 4 lety

    This has been a great explanation video!

  • @user-oy4kf5wr8l
    @user-oy4kf5wr8l Před 3 lety +1

    Buddy, keep doing these amazing videos plz on low level design too.
    A question plz, for sd1, i think those concepts covered by your video are enough, right? for sde 2, i will buy your course ^ ^

  • @wayneli7814
    @wayneli7814 Před 4 lety +1

    Hi @Think Software, thanks for your great video! I want to share some of my thoughts about the drive location service, you make an assumption before you calculate the grid id from Long/lat, which is the grid size is fixed. But in practical, the grid size is dynamic(because the density of different area is different), so it should be very hard to get the gridId by long/lat in constant time. I would say the best way of the lookup service is still to transverse the quadtree, but what we can do is to find a way to optimize the update issue. What is your thoughts?

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      Thanks for the comment. Usually the grid size will remain the same but the search size would be increased in that case and please note the grids which are empty will not have any record in the in-memory datastore/cache.

    • @wayneli7814
      @wayneli7814 Před 4 lety

      Think Software I see, thanks for your reply. That makes sense, we can have the grid to cover a very small area, and don’t store the grids that locate in ocean. BTW, I just checked more info about google s2, looks like they are using the “Hilbert curve” algorithm to convert the coordinations into a 64 bit number, then they can find the nearby grids by just simply increment/decrement the number.

  • @OmkarGhanekar
    @OmkarGhanekar Před 2 lety +1

    Once the websocket connection is established, for every new piece of data that the client is sending to the service, how does the LB ensure that the data gets sent to the specific server which is holding the websocket connection for the user? I'm assuming a dumb RoundRobin LB can't be used in this situation.

    • @ThinkSoftware
      @ThinkSoftware  Před 2 lety +2

      Thanks for the question. Short answer: yes a LB need to support websocket. If a LB does not support websocket then it cannot be used in this scenario. Long answer: for this I may have to make a video :)

  • @amit1114
    @amit1114 Před 4 lety

    Awesome explanation of the system. Thank you for sharing your knowledge!
    Regarding Routing Service: How would you make routing service highly available if you are using long polling or websocket and your passenger/driver is connected to 1 App Server. If that app server were to go down, all passengers/drivers connected to that App Server will loose connection and the data in Global Cache will be incorrect.

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      Thanks for the comment. Regarding your question: there are two things that will be happening. Once an app server goes down then the client app will see that the web socket or long poll connection has been terminated so in that case will retry to reconnect and this time the connection request will go to some other app server by the LB and that app server will update the cache. If the app does not come to connect then there will be a stale entry in the cache. When dispatch servers will try to use that entry to send notification to the client app, they will know that since the app server is down so the entry is stale and at that time entry will be removed from the cache.

    • @ankuj2004
      @ankuj2004 Před 4 lety

      @@ThinkSoftware what happens in a scenario, where the proxy server connected to client dies. The entry in cache is stale. The dispatch server comes back before a new connection is established .

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      when the dispatch server will contact the proxy server which has just come up then the proxy server will return connection not found for that client and then dispatch server will invalidate the cache entry if it is same as what it read before contacting the proxy server.

    • @ankuj2004
      @ankuj2004 Před 4 lety

      @@ThinkSoftware but dispatch server needs to respond back to the client. Even if it invalidates the cache entry, how will it communicate back the request response to client. It cant keep holding the response till the time entry in the cache is updated

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety +1

      In that case it will just send a push notification to the client app that there is new event.

  • @anuragkothare6181
    @anuragkothare6181 Před 4 lety +1

    Good Explanation.

  • @GokhanArik
    @GokhanArik Před 3 lety

    I don't think they proposed distributed hash table to be used in the lookup. Their argument is the driver reports location every 3 seconds, but we update QuadTree every 15 seconds. 15 seconds is still a short time and drivers' location will not change much. To get nearby drivers we will still use QuadTree and hash table will only be used to keep QuadTree up to date. Correct me if I'm wrong

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety +2

      I don't know if they have updated the text in between. The video is several months old now. Right now if they are storing in DHT to be update Quadtree every 15 sec later then it would be better to let the driver send the update every 15 sec directly to Quadtree. The reason for keeping to a DHT would be if they want to batch update the Quadtree but they didn't talk about batch update.

    • @MrAnshulji
      @MrAnshulji Před 3 lety

      Also the sharding for quadtree was based on driver_id (stored in leaf nodes), so it won't be like one node of the quadtree is on one server and the child node is on another. It was more like, the quadtree would exist in its totality on all servers, but some servers would have a subset of the driver ids at the leaf node. And eventually scatter-gather could be used to retrieve the drivers.
      In any case, that approach has been rendered inefficient by this grid-based solution that you described. I wonder why no one thought of this or suggested this C-Squares based approach in any of the other design videos.

  • @karanbhatia2834
    @karanbhatia2834 Před rokem

    I have a question regarding the routing service.
    We have 2 types of servers: proxy servers and the dispatch servers. What is the reason for having these 2 types? What is the difference in their role that they are categorized separately> Can't we just have one layer of servers that basically talk to other services in the system?

    • @ThinkSoftware
      @ThinkSoftware  Před rokem +2

      Proxy servers receive request from the client. They are sitting behind load balancer and forwards the request to the internal microservices based on the endpoint used in the API call. Dispatch servers, on the other hand, receive requests from the internal services (when internal services want to send message to some client app). A request from an internal service goes to a dispatch server behind load balancer (load balancer on the internal side facing internal services). The dispatch server need to dispatch the request to the right proxy server where a websocket connect or a long pull request is held for a particular client.

  • @amolnagotkar3037
    @amolnagotkar3037 Před 2 lety +1

    thnx for this video

  • @sthirumalai
    @sthirumalai Před 3 lety

    Hi, The presentation and thought process was excellent.
    On scaling the driver location module wrt. the grid service. In case of hot nodes which are partitioned based on the grid id. I think we can go for the auto scaling on the compute layer where as on the data layer which serves the grid vs drivers key/val can't we shard the grid by adding more precision to the lay/long pairs? By that way the hot data will be distributed even to more shards and the app may scale well. Please correct me if I am wrong.

  • @xinlongzhang1187
    @xinlongzhang1187 Před 4 lety +1

    Very nice video!

  • @sunny-14689
    @sunny-14689 Před 2 lety

    Nice video, can a driver be assigned to multiple nearby gridids, may help avoid larger area query if none found in first attempt since a driver can belong to multiple lists thereby increasing chances of finding nearby driver.

  • @preetiduhan9937
    @preetiduhan9937 Před 3 lety +2

    A big dislike to those who disliked the video.If you can't appreciate someone's effort atleast don't discourage them.

  • @metallaser4409
    @metallaser4409 Před 3 lety +1

    Nice video! Can you explain the difference between an API gateway and routing service? Would you still have an API gateway in your architecture to route requests to either map/user/routing service?

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety +1

      You can consider them as same.

    • @metallaser4409
      @metallaser4409 Před 3 lety

      @@ThinkSoftware Thanks for responding. If I were to use Amazon's API gateway, would that be able to handle sessions/websocket connections for me? I thought API gateways were stateless. Is that incorrect?

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      It depends on the implementation. I think Amazon API gateway supports websockets. Also what do you mean by stateless here in case of API gateway? To me it seems like you have wrong understanding of what stateless mean.

  • @shikhil1
    @shikhil1 Před 3 lety

    This is great video and explained the approach in depth, but you could improve/explain in depth about searching of location in 2D Grid using two decimal points and finding the near by grid id.

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      Thanks for the comment

    • @shikhil1
      @shikhil1 Před 3 lety

      @@ThinkSoftware can you please answer my query also? how your are going for finding location in 2d grid using decimal point and near by location?

  • @franciscog3378
    @franciscog3378 Před 3 lety

    very helpful. Thank you

  • @RaunakShrivastava
    @RaunakShrivastava Před 3 lety +1

    Why don't the passenger or Driver communicate to User and Map services via the Routing service?

    • @ThinkSoftware
      @ThinkSoftware  Před 2 lety

      Both solutions are possible. I am using different approaches in different videos for curious minds like you to ponder upon :)

  • @kamalsmusic
    @kamalsmusic Před 4 lety

    So regarding using the global in memory cache to store transient information... we would have to make the cache quite large to ensure that the data is not evicted a lot, right? If there are frequent evictions, then we may have to re-determine the drivers location from the client and then insert again into the cache?

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      Yes. The other option would be store this in database but that would put a lot of pressure on the database. So basically, the distributed in memory case here is used to store transient information as there is no point storing them in the database and so the purpose of using cache here is not to remove pressure from DB as there is no DB but cache here itself is being used as transient data store.

  • @winning_aadict
    @winning_aadict Před 4 lety

    Sir, can you also add a transcript of the video for people who prefer reading. Also, this would enable us to look up words that are new to us and may not be clear in the audio for various reasons.

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      Thanks for the comment. I am going to soon launch a book which will have this and more.

  • @user-nt9dk5xl5z
    @user-nt9dk5xl5z Před 10 měsíci

    Thank you very much for your videos. Good explanations. Just One thing you are not discussing about selection type of db as part of design. Why?

    • @ThinkSoftware
      @ThinkSoftware  Před 9 měsíci

      There are other videos discussing the DB selection. Thanks

  • @taishancorp7720
    @taishancorp7720 Před 3 lety

    To solve hot partition problem of grid key ranges, can we first shard by key ranges and then and have a mapping of partitions where this key range could land ? Majority of cases there will be single partition, but for hot partitions we may need to scan more than one.

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      what advantage we would get with this?

    • @taishancorp7720
      @taishancorp7720 Před 3 lety

      @@ThinkSoftware I mean for hot partitions , we can have finer precision shards on those. Key ranges don't need to be uniformly sized.

  • @taishancorp7720
    @taishancorp7720 Před 3 lety

    In 35:20, you mentioned that dispatch servers need to find app server in routing service before it can send data back . Couldn't those routing service's app servers just used a long-poll request on the dispatch server ? Why we need the additional complexity here ?

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety +1

      Thanks for the comment. The driver apps even if using long poll they are waiting for trip requests and that request will be parked on a machine. How would you know in which machine the request is parked?

    • @taishancorp7720
      @taishancorp7720 Před 3 lety

      ​@@ThinkSoftware Do you mean a request parked on a Dispatch server machine ? I feel this may be an implementation detail. From the HTTP frameworks I've worked with, when a HTTP request is received (here it's Dispatch server), it has the connection info in an object already. If it wants to support long-polling, it can "pause" the response for the connection and then when a driver has acknowledged (data is ready) "resume" the connection with the data in HTTP response. From client's point of view (Routing Service), I don't think it knows that this was a long-polled query versus a normal slow query.

  • @user-gc5kv3cd7j
    @user-gc5kv3cd7j Před 2 lety

    Nice! For the driver location service, when grid id is used as key, list of drivers used as value, it relies on driver client to remove driver from list if location changes grid, but if the driver client dies, we might get stale data i.e. return driver who might be off work. How to avoid this issue?

    • @ThinkSoftware
      @ThinkSoftware  Před 2 lety +1

      We can have last heartbeat time and then if it is more than some threshold then we ignore and remove those driver entries.

  • @harrylin6282
    @harrylin6282 Před 4 lety

    Hi Think Software, when the DB is partitioned by trip id and a driver wants to list the trip histories, it is slower since it needs to go to all the partitions to grab data. But I am a little bit confused why it is slower, we can create multiple threads and query each partition in parallel ( each thread can go to each partition ), so the query time should be the same as querying in one partition. is it correct?

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      Thanks for the comment. There will not be a single customer query that you can parallelize. Think about thousands of customers querying their trips. Now if for all the customers you will scatter/gather on all partitions then it will be way slower than different customers queries going to different partitions.

  • @RahulSingh-ln7bg
    @RahulSingh-ln7bg Před 4 lety

    Bro, R Tree is hierarchical. How will you apply that here in distributed system? Matching based on this would be inefficient? You need better data structure to do it in real-time.

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      I didn't discuss using R-trees for this reason. There are other online resources that discuss using R-trees, quad trees etc. but I have already pointed out issues in that approach in this video.

  • @Claudius025
    @Claudius025 Před 4 lety

    why would you use a double for the trip price and not a float? I have the same question for the latitude and longitude.

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      Float can be used as well. Thanks for the comment.

  • @letstalkcareerwithradhika9028

    Could you help me understand the function of dispatch server farm in the routing service? I know you said that it will send the info from trip and other service to proxy farm. Why don’t have only proxy servers?

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      This is discussed in more detail in the course

  • @cbest3678
    @cbest3678 Před 3 lety

    @Think software.. thanks for this video. I have few doubts about this:
    - So after this flow passenger -> trip dispatcher -> Driver location service ->trip dispatcher will apply matching -> will send those matched driver notification about the matcher passenger. How driver will get this notification about the matched passenger for the trip request ?

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      This is discussed in the course. The short answer is that the Trip Dispatcher service will call dispatch servers in the Routing service that will look up the address of the host in the cache where the WebSocket connection is held for the particular driver and then forward the request to that proxy server. The proxy server will then forward the request via WebSocket.

    • @cbest3678
      @cbest3678 Před 3 lety

      @@ThinkSoftware Thanks for the response.. wondering how all service will call dispatch service .. I mean dispatch is in middle layer..so you mentioned when any service done processing the request it will send that response to dispatch service and then the dispatch service will see the cache and will contact appropriate app server in routing server. why do we need extra dispatch service is it adding more complexity ? I mean how other service will call dispatch service to send the response back ?

  • @mehranbehbahani3050
    @mehranbehbahani3050 Před 2 lety +1

    There is absolutely no way that this level of complexity is expected in a 45-minute interview. This is a solution that a team of engineers would work on for a week at least. Correct me if I am wrong.

    • @ThinkSoftware
      @ThinkSoftware  Před 2 lety +2

      For L6+ definitely. The interviewer though may not cover everything but could go in depth in one or two things.

    • @mehranbehbahani3050
      @mehranbehbahani3050 Před 2 lety +3

      @@ThinkSoftware I see. You know what would be a killer video? Why don't you make a video that explains the complexity and details expected by the interviewer based on the level of the position? Like if you are interviewing for an l5, talk about this and that. If it's l6, talk about this and that in more detail. If l7, you need to have a deep understanding of topics A B and C for example.

  • @ChintanShah22
    @ChintanShah22 Před 4 lety

    Excellent explanation, thank you ! Is it expected to come up with this level of detail in a 50 minute interview ? It seems if one is not already very familiar with a particular system, thinking and brainstorming itself will take up a majority of the time.

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      In an interview, you might not be able to cover everything and it is expected but the interviewer can direct you to go deeper in the internal design of one or two components. This also becomes very important at higher engineering levels like L6+ or architects. These interviews at these levels can very quickly validate whether a candidate is suitable for L6+/architects or not.

    • @gdthegreat
      @gdthegreat Před 4 lety

      Will be this advantage if interviewee is with 1 year ex? For sde1 or L3?

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety +2

      More knowledge is always useful.

  • @iamavinashdaipule
    @iamavinashdaipule Před 4 lety

    Thanks for this amazing video. It answered so many of the questions I had about uber design or design in general. What happens in the driver location cache when the list of drivers is huge for id_latitue_longitude-->[list of the driver's id is 50,000 or more] does the cache have the capacity to store that many drivers ids in a field?

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety +1

      Thanks for the comment. The grid size is way smaller to have 50k drivers in it.

    • @iamavinashdaipule
      @iamavinashdaipule Před 4 lety

      @@ThinkSoftware Thanks for the reply, yes 0.01 *0.01 is a small area.

  • @rishabhsharma1053
    @rishabhsharma1053 Před 4 lety

    Awesome tutorial and explanation of the system!!! I have a doubt regarding the partitioning based on both driver & rider ids to be able to trip history for drivers & riders from a single partition. Will this implementation mean that entire trip history data would be duplicated across shards?

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      Thanks for the comment. What do you think?

    • @andrearmstrong3715
      @andrearmstrong3715 Před 3 lety

      It seems to me that there will indeed be replication, but that's totally fine. Its a classical case of time vs. space complexity. It should be straightforward for a user, be it a rider or passenger, to find all of his trip history within the same shard.
      That being said, another option is to shard by timestamp, which could prevent data duplication. However, shards holdinh newer trips would get most of the load, so it is not a great design.
      I would combine the first sharding approach (duplicating data) with a local LRU cache of trip history, which benefits users who check it out constantly.

  • @Bommutiwariplayz0170
    @Bommutiwariplayz0170 Před 3 lety

    Can you make video on map service and especially how type ahead location suggestion works

  • @rahulsharma5030
    @rahulsharma5030 Před 3 lety

    1. in design of routing service, once you upgrade http to websocket, after that let us say a client request, then the load balancer will forward to the backend server with which connection is established?
    2. If answer to 1 is yes, then will load balancer store this mapping, like which client is mapped to which backend node for web socket connection?
    3. Also i dont understand like if we upgraded to websocket, then what is the current type of connection between client and load balancer?If it is TCP or HTTP?As you mentioned in video i think its TCP, then how can we have millions of TCP with load balancer ,it is not possible.

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      Load balancer is only used for initial connection setup when the client made the initial http request. Once the connection is established, the load balancer is out of the picture and client directly talk with webserver with which it has TCP connection

    • @rahulsharma5030
      @rahulsharma5030 Před 3 lety

      @@ThinkSoftware thank you!!!:)

    • @rahulsharma5030
      @rahulsharma5030 Před 3 lety

      @Think Software. If i understand correctly:
      Then client sends request to load balancer. Then load balancer will decide which server can handle request and then it sends server name back to client.Now client directly sends HTTP upgrade request to the server(that is returned by load balancer)?

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      No. This is not correct.

    • @rahulsharma5030
      @rahulsharma5030 Před 3 lety

      @@ThinkSoftware then what will be flow you tole me previously? The other way can be the client sends upgrade request to load balancer and then load balancer sends same request to server ?and then server response send back to client via loadbalancer?
      I cant think of any other way.Can you please explain?

  • @rahulsharma5030
    @rahulsharma5030 Před 3 lety

    let us say that first time customer logsin it gets some list of drivers near it:
    1. How will we maintain info that when the driver sends new lat long then we have to send to some customers which can track driver on their app?
    2. How do we update that info when a new driver comes in same area or current driver leaves?

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      The initial 4/5 drivers that Uber app shows in the map has nothing to do with the functionality of the Uber service. Uber can show random drivers there and for booking a trip it does not matter.

    • @rahulsharma5030
      @rahulsharma5030 Před 3 lety

      @@ThinkSoftware it has to right?It tells me where the drivers on my screen moving,how much is their ETA to me.

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      Are you talking about initial 4/5 drivers or the driver that is assigned to your trip? For the first don't care but for the later you will poll the status periodically.

    • @rahulsharma5030
      @rahulsharma5030 Před 3 lety

      @@ThinkSoftware initial.When i open app,it shows me movement of nearby drivers on app(prior to booking also)

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      There is no ETA associated with them. Servers can you an initial list of 4/5 drivers and then you can poll them periodically

  • @theghostwhowalk
    @theghostwhowalk Před 4 lety

    24.14 sec Q how can we use Dijkstra to find shortest path.
    We can compute S to D total straight line distance. Say it comes to 20Km, then we will have a threshold say 1.5, all the nodes which are 30Km from S or D will be considered and fed into Dijkstra to find the optimal path.
    At 46.35: about hot partitions:
    We can pre identify hot spot areas (say downtown or sport arena) and split those areas into even smaller grid for use. That way we can make sure limit is not exceeded. Similarly if there is some mountain or forest then we can combine multiple grids into one.
    At 59: it depends on if writes of trips are more or reads for driver ids are more. I would think #1 is more frequent. So writes may need to be fast. So prefer option 1 as slower reads for driver may be OK.
    Most Welcome for any comment to improve/correct me 😀

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      Thanks for the comment 🙂

    • @theghostwhowalk
      @theghostwhowalk Před 4 lety

      Think Software thanks for all the great content! It’s precious. I can see years of experience and weeks of efforts to come with 1 hour content.
      Also If you think there is flaw in above, please do let me know. 😀

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      Thanks for the nice words🙂

  • @ameyasawant5663
    @ameyasawant5663 Před 3 lety

    For trip service, can we use bloom filter for searching for trips?

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      Searching for which trips?

    • @ameyasawant5663
      @ameyasawant5663 Před 3 lety

      @@ThinkSoftware I meant with approach 1 we can use bloom filter to figure out which partition might have the trip-id. Just a suggestion. Your videos are awsome!

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      Please elaborate how?

  • @rmadhavmca1
    @rmadhavmca1 Před 4 lety +1

    could you let me know the tree name again?

  • @uditagrawal6603
    @uditagrawal6603 Před 3 lety

    Can we use geohash in place of quadtree?

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      Yes. This is discussed in the course as well.

  • @rishabhdugar1180
    @rishabhdugar1180 Před 3 lety +3

    rateUser me trip id nahi li ? kyun ?

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      Why do you think rating a user require trip ID?

    • @rishabhdugar1180
      @rishabhdugar1180 Před 3 lety

      because rating system works per trip not driver to rider or vice versa just once.

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      It depends on the requirements. If the requirement is to derive the driver rating from trip rating and also store the trip rating then yes. Otherwise both driver and passenger can be provide option to rate each other at the end of the trip and may not require trip ID in that case.

    • @abhishekjain96
      @abhishekjain96 Před 3 lety

      @@ThinkSoftware That would be correct from the UI perspective where user has to act as per what is available on the screen. But from API perspective, not taking a tripId in the request would mean that a user can rate any driver(and vice versa) even if they haven't done a trip together(which shouldn't be allowed). And also a user can rate the same driver again and again

  • @Funsiestype
    @Funsiestype Před 2 lety +3

    Get map service? Get ready to be put on spot on the need for this service.
    You need to speak about the choice you made for a single user service service instead of driver vs rider users. Need to speak about trade offs and reasoning.
    In the high level diagram, be ready to be picked on by users talking to user service directly. Depict Load balancer. Again, most interviewers are a-holes.
    Rambling begins at 37:10 without setting the context for how the quad tree is sharded based on which this homie is crapping on Grokking course. Even before that , I challenge the assertion that quadtree is not highly available. Be ready to defend why we can’t have replicas for quad tree. (If you say yes, be ready to explain how would you ensure replication works)
    For love of god, I can’t decipher what you are saying from 40:40 to 41:24. 🤷🏽‍♂️ (next time I suggest you write better notes for you to refer when making videos. I notice you did that here. Just make sure you formulate your thoughts.
    Your design has grid ID’s pre broken partitioned by grid ID either by hash or range. You don’t really provide an answer after all that crapping on the other author. Slick. If hash partitioned by grid ID , won’t you need to scatter gather for neighbors?
    I’m some areas like Nebraska where it’s less dense it’s Inefficient .
    If partitions are hot, you need a way to re partition . Consistent hashing (add new nodes) and replication could be options.
    49:18 yup.. nice work. See you sound coherent when you check notes. Good job champ.
    49:28 after all the crapping on other author you come back to the original problem and mention that we need to add more replicas in hot partitions. Ohhh Bhai… Maro mujhe. Mujhe maaro. Mazaak ho Gaya yaar.
    You need to talk about what happens if cache servers crap out. How would you build the quadtree again during crash stop faults? Be ready to talk about this.
    I went and read the grokking’s design. I don’t think you understood what that authors proposal was clearly.
    Good job on talking about partition by document And partition by term indexing.
    I am not saying anything negative about your videos, your videos are great. Just trying to question as you requested. Bye boo.

    • @Funsiestype
      @Funsiestype Před 2 lety +1

      On a serious note, anyone who is watching this, my two cents, prepare well and make sure your fundamentals are strong. You need to discuss trade offs in an interview. Nothing is perfect and there are no silver bullets for problems at scale.

    • @ThinkSoftware
      @ThinkSoftware  Před 2 lety +2

      Thanks for the comments.

    • @curiousspirit3947
      @curiousspirit3947 Před 2 lety +3

      ​@@Funsiestype i think it's clear what he says from 40:40 to 41:24. If you shard your distributed cache by driver id it makes write easy, but the reads need to ask every node (scatter gather), which is inefficient.
      "Your design has grid ID’s pre broken partitioned by grid ID either by hash or range. You don’t really provide an answer after all that crapping on the other author. Slick. If hash partitioned by grid ID , won’t you need to scatter gather for neighbors? "
      => the difference is that now he only does a scatter gather for neighbors IF not enough drivers are found in the grid of the user. and the scatter gather is only on the neighbors, not all the shards.

  • @umer7936
    @umer7936 Před 3 lety

    Can your company or someone who you may know develop a software like Uber for my rideshare company? please reach out to me.

  • @bostonlights2749
    @bostonlights2749 Před 4 lety +1

    6th Sept 2020
    6,836 views
    1.87 K Subs

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety +1

      are you trying to track the progress of this video? :D

    • @bostonlights2749
      @bostonlights2749 Před 4 lety

      :)
      Brilliant Content btw, I've seen other channels which just post directly the content from the company's engineering blog . This was unique and from scratch..
      Wish you all the best!

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      Thanks for the nice words 😊

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety

      Thanks

  • @syedabdul1168
    @syedabdul1168 Před 4 lety

    hi sir..u have mentioned that you like to mentor..please let me know is there anyway to contact you..

    • @ThinkSoftware
      @ThinkSoftware  Před 4 lety

      Hi. You can contact me through the CZcams website contact button. It will show my email address.

  • @kiaradas8355
    @kiaradas8355 Před 2 lety

    Does anyone know his name? :)

  • @user-rw6lx7hg7i
    @user-rw6lx7hg7i Před 3 lety

    Please enable the CC button.

    • @ThinkSoftware
      @ThinkSoftware  Před 3 lety +1

      Thanks for the comment. I didn't do anything related to CC. May be CZcams has made some updates

    • @user-rw6lx7hg7i
      @user-rw6lx7hg7i Před 3 lety

      @@ThinkSoftware Yup. There would be some setting that enable CC button I think.

  • @josersleal
    @josersleal Před rokem +1

    WHY DO YOU TALK in front of the board?

  • @gouravgarg6756
    @gouravgarg6756 Před 3 lety

    Great explanation!. Thank you.