System Design : Design a service like TinyUrl

Sdílet
Vložit
  • čas přidán 29. 08. 2024

Komentáře • 489

  • @himanshuverma31
    @himanshuverma31 Před 7 lety +94

    We can optimise the generation of the URL part in your service layer by delegating the key generation part to multiple instances of offline services(KeyGenerationService). Each service will generate the keys in the particular range and give the pre-generated tiny urls to servers in need. Servers can also keep some unused url in their local caches to speed up the process more. Their will be some race conditions which could be taken care by designing the service properly to take care of interaction between different threads in the services. All in all, it is a great video, kudos to the efforts you put to come up with such content !!!

    • @tusharroy2525
      @tusharroy2525  Před 7 lety +27

      its a good idea of pre-gnerating of tiny url. Thanks for bring this up. Hopefully other viewers will read this.

    • @rockylovesall
      @rockylovesall Před 7 lety +1

      Awesome approach!, I think race condition we can handle via some synchronization approach.

    • @yasharshahi
      @yasharshahi Před 7 lety +2

      +Tushar Roy - Coding Made Simple thanks for the great video.
      you can pin a comment if you want your viewers to see it.

    • @RussellWu
      @RussellWu Před 5 lety +2

      The pro of using a range is because of its compact size to be stored and operated on. Pre-generated keys will take up a lot more space and increase a lot more roundtrip time between internal services while calculating a tinyurl from a integer is relatively cheap. So I think it's only worth it if the calculation is expensive, like blockchain addresses.

    • @renjithr7676
      @renjithr7676 Před 4 lety

      in this approach you should first check actual url already present, insert only if not.

  • @arnabsaha876
    @arnabsaha876 Před 4 lety +7

    I appreciate your efforts in making system design tutorial videos. My algorithms for tackling this question:
    -6 characters long TinyURL:
    a. Use MD5/SHA256 on Base64 encoding with 6 letters long key (This was mentioned by you also, I uplifted it a bit)
    b. Use Key Generation Service for 6 digit keys already pre-computed - Key DB of ~500 GB size + Replication of this to avoid Single Point Of Failure + Lock system to avoid concurrency issue + Use Memcache to speed up -> Fetch key from DB and Use HTTP 302 redirect to browser, in case of success else return 404 to the user.

  • @ilanchen6583
    @ilanchen6583 Před 5 lety +4

    I think your system design videos are top notch and helped me tremendously. I usually spend my working hours writing code and not worrying about the design videos, but thanks to you I was able to bridge the gap of missing knowledge

  • @SheldonZam
    @SheldonZam Před 7 lety +9

    Tushar! I'm so grateful for what you're doing here on CZcams! I've literally spent hours watching your videos on dynamic programming this last week to study for an algorithms module of mine. Even almost got a bit of your accent!

  • @vitaminb4869
    @vitaminb4869 Před 7 lety +341

    People (multiple people) spend days/weeks/months designing a scaleable system. Then comes some hot shot interviewer with all the right answers he found earlier from his google search, and then asks you to design a tinyurl system and expects you to spit it out in 24 minutes? Also consider the fact that his company has a product that is being used by only a few users, that a Windows 98 PC could happily support. Interviewers really need to get off their fucking clouds and stop searching for unicorns.

    • @tusharroy2525
      @tusharroy2525  Před 7 lety +54

      lol

    • @ripplecutter233
      @ripplecutter233 Před 6 lety +13

      Vitamin B goddamn tech interviews. (begrudgingly goes back to studying some obscure algorithm that some brilliant mind took ages to figure out)

    • @sohineebasak4644
      @sohineebasak4644 Před 5 lety +1

      couldnt agree more

    • @gopala5334
      @gopala5334 Před 5 lety +1

      lol, exactly said. I was asked to design the same in 30 mins, that too over telephonic discussion. Great video, thanks to Tushar Roy.

    • @borhadesumit58
      @borhadesumit58 Před 4 lety +2

      I’ve exactly same view. People/scientists invest days, months or even years to come to solution and some stupid people expect to answer in 30 or 60 mins.

  • @renurani3311
    @renurani3311 Před 5 lety +17

    For people who are confused why do we need 43 bits:
    2**42 = 4398046511104 = 4.3 * 10 ** 12 = 4.3 Trillion.
    So, we only need 42 bits to represent 4.3 trillion numbers
    But these numbers also include negative numbers and we only need positive numbers (0 - 3.8 Trillion). So we will 43 bits and keep our MSB as 0 to make it a positive number.

    • @vipuljain9550
      @vipuljain9550 Před 5 lety

      However, Tushar says that 5 bits can be randomized, rather he should mention it as 4 bits, please correct me if am wrong.

    • @renurani3311
      @renurani3311 Před 5 lety

      @@vipuljain9550 Yes, that should be 4 as he needs first 7 bits to represent number 0 - 63 (MSB is always 0 because number is positive)

    • @lugesot
      @lugesot Před 4 lety

      I just want to say, you are so gogerous.

    • @vishalmahavratayajula9658
      @vishalmahavratayajula9658 Před 4 lety

      Thank you. This saved my time

    • @TheProCion
      @TheProCion Před 4 lety +2

      I don't think that this is correct. Just assume that the number is unsigned integer.

  • @edithau8919
    @edithau8919 Před 7 lety +9

    Hi Tushar - Thanks for the awesome video! You asked how other people would design the url service and here’s my 2 cents: I would start the design with a stateless API service and write thru in-memory cache, backed by a relational database. Collisions are detected at the db level (eg. a unique primary key on the shorten url) and corrected by hashing the original url + a prefix. The regional issue you mentioned could potentially be mitigated with a combo of load balancer routing configuration and in-memory cache at each service instance. In the case of GET request burst, it is likely that the instances will have the URL cached.
    I would imagine this starter design is good for couple thousands URL per sec. An AWS RDS with SSD could handle up to 10K IOPS according to their doc. The in-memory cache size could also be adjusted for read performance. To scale up this design for higher volume, I probably would try the distributed cache with write-behind before considering Zookeeper. In general, I like to stay stateless if possible :)
    I created a simple tiny url implementation in case anyone is interested.
    github.com/edithau/simple-tiny-url

  • @navdeepdahiya4637
    @navdeepdahiya4637 Před 7 lety +49

    Dear Tushar, it is great to see that you are back in action. Thanks a lot for your efforts. I truly appreciate your hard work. Could you please share your knowledge on design problems. Thanks a lot.

    • @sagardafle
      @sagardafle Před 7 lety +4

      Navdeep Dahiya Kya baat hai

    • @tusharroy2525
      @tusharroy2525  Před 7 lety +47

      I m trying. Started with tiny url and more to come.

    • @rohitkumarkhatri2203
      @rohitkumarkhatri2203 Před 7 lety +1

      You are awesome..... Plz keep it up with ur design knowledge.....

    • @UMAKANTJENABCE
      @UMAKANTJENABCE Před 7 lety +1

      Please continue to do your good work of design problems I am waiting for your next coming videos on this eagerly.

  • @michanestorowicz6577
    @michanestorowicz6577 Před 3 lety

    Again, very good content. I really like that you're talking fast and not waste time on things that are easily found on the internet (definitions etc.). Very nice to watch.

  • @rishabhdaim
    @rishabhdaim Před 7 lety +3

    Hello, tushar, I would really like to thank you for your videos. recently, I was able to crack an interview by watching your videos. Especially your videos on Dynamic Programming are treat to eyes. Thanks again.

  • @nitinagrawal6637
    @nitinagrawal6637 Před 5 lety

    Appreciate such videos which give different/alternative ways towards solutions & one must go through various such views before finalizing on the solution. But seriously, only a king of fools can ask such questions & expect a good logical answers in those 10-15 minutes. I myself have got such questions with the expectations to say those fancy words of design patterns & algorithms. Man, such interviewers themselves do google the questions/answers for the interviews & they themselves don’t know to solve the trivial problems in their own projects. Many issues with interviewers’ mentality & processes but for such videos, these really help a person to think differently & find the solutions. Appreciate Tushar for providing such informative videos.

  • @soumyarooproy
    @soumyarooproy Před 6 lety +2

    @Tushar, thanks for the video. 12:15-13:55 is a very convoluted way of conveying that log2(62**7) bits are needed encode 62**7 values. Also, that evaluates to 42 and not 43. An oversight, I presume...

  • @daspradeep
    @daspradeep Před 6 lety

    the best i have seen on this is to pre-generate the short urls and keep them ready. Have relational DB with failover (two or more DB servers) to hand over these keys (relational allows to implement locking). Hash Partition the read shards with consistent hashing. You have the full solution without any single point of failure or hot-spots.

  • @erytorth9789
    @erytorth9789 Před rokem

    Thank you for the video! That's the most organized / structured content (about app layer) I've seen for that problem.

  • @revanthkumar1183
    @revanthkumar1183 Před 6 lety +10

    I think it is supposed to be 42 bits not 43. If you think about it, your language has 62 chars and to uniquely identify each char you need 6 bits. because 2^6 is 64. Then you can map 000000 to a, 000001 to b and so on till 9. So in a 7 character string each character will need 6 bits and 7*6 is 42. So you will have to save 42 bits in your DB and when you get it back from your DB, you have to use the same mapper and get back the characters. Hope that clears out the doubt around 42 and 43 bits.

    • @kingofwebguru
      @kingofwebguru Před 2 lety +1

      @Revanth Kumar Thanks for the clarification. I was thinking why it is an odd number (43).

    • @nrjdalal
      @nrjdalal Před 9 měsíci

      I have built one too rdt-li an open source free url shortener with built in analytics, check it out

  • @LuveenWadhwani
    @LuveenWadhwani Před 2 lety

    Ommggggg Tushar's first ever design video 😍😍😍

  • @melarryify
    @melarryify Před 6 lety +4

    Instead of base 62 we can use base 64.Just need to add 2 more characters. Underscore and dash(_-) can be added to 62 characters we have. This makes its easier to convert a MD5 output to a 7 character printable string as you can convert blocks of 6 bits directly to base64 character. Even this CZcams link uses underscore to represent url of this video !!!

  • @visintel
    @visintel Před 4 lety

    This is a great video for people who are trying to learn system design. Not so great for people prepping for an interview

  • @theanigos
    @theanigos Před 3 lety +4

    This guy and this video helped a millions.

  • @muktarali2842
    @muktarali2842 Před 7 lety +12

    Very nice explanation. Please share more design/ system design topics. Thanks a lot.

  • @rish2cool
    @rish2cool Před 6 lety

    Great articulation & simple enough for non CS folks to understand. Thanks Tushar!

  • @ahmer9800
    @ahmer9800 Před 5 lety

    i really appreciate how indepth your design analysis was! thank you

  • @vib7777
    @vib7777 Před 4 lety +1

    Counter based approach is not optimal. There are multiples challenges like maintaining the locking in a worker thread. Another thing these days deployment of worker node done multiple times in a day and every time worker will lose the counter and It will have to ask a new range from zookeeper.

  • @sberfield
    @sberfield Před 4 lety +4

    Thank you for this. I have been going through examples and lessons and this made it all click for me.

  • @SudiptaKarmakar87
    @SudiptaKarmakar87 Před 7 lety +1

    Tushar, I thoroughly enjoy your design analysis. One suggestion though, it would be beneficial for the audience if, in the description, you directly linked any reference to the dependencies - concepts, articles etc (example, Apache Zookeeper) that you talk about in the video. Thanks and keep being awesome.

  • @alekseyt4029
    @alekseyt4029 Před 7 lety +1

    To support ranges we created service that just give your process some range (x to x+100000) and increment counter in DB. It is guarantee the everyone is completely unique and that service is not heavily loaded

    • @tusharroy2525
      @tusharroy2525  Před 7 lety

      +Aleksey Telyshev fair enough. That should work

  • @mtareen
    @mtareen Před 6 lety +6

    Instead of covering how we will design starting from what we will ask the interviewer to creating high level design and then scaling that solution and so on and so forth, the presenter is going to detailed into how we will generate the tiny url etc. I would recommend looking at this for a good example of how to design tiny url
    www.hiredintech.com/classrooms/system-design/lesson/52

  • @RandomShowerThoughts
    @RandomShowerThoughts Před rokem +1

    I haven't seen anyone explain how to implement the b62 hash properly. If you pass in 1 to the typical base62 function, you will get a single character in certain cases

  • @StanislavKozlovsk
    @StanislavKozlovsk Před 7 lety

    Thank you so much for the video, I am really excited and glad to have you back!
    Keep on killing it!

  • @zackhellripper
    @zackhellripper Před 5 lety +47

    Curious why you choose 43 bits, as you only actually need 42 bits to represent 3.5 trillion (max for 42 bits is 4.398 trillion). Is there a use for the extra bit I'm not seeing?

    • @rohanabhutkar
      @rohanabhutkar Před 5 lety +2

      Yes also since we have 7 char tiny url, and each char is 6 bits in base64, 42 bits is enough even if we use base64 encoding (a-z, A-Z, 0-9, _ , . )?

    • @adityasahai9020
      @adityasahai9020 Před 5 lety +5

      @@rohanabhutkar he is using base62 because there are 62 unique chars.

    • @renurani3311
      @renurani3311 Před 5 lety +15

      2*42 = 4398046511104 = 4.3 * 10 * 12 = 4.3 Trillion.
      So, we only need 42 bits to represent 4.3 trillion numbers
      But these numbers also include negative numbers and we only need positive numbers (0 - 3.8 Trillion). So we will 43 bits and keep our MSB as 0 to make it a positive number.

    • @Shogoeu
      @Shogoeu Před 4 lety

      @@renurani3311 or we can use unsigned data type and save 1 bit :D

    • @zhongyuanzhang5611
      @zhongyuanzhang5611 Před 4 lety

      @@renurani3311 ddkkokoi
      Iojkoiw e seeders (know((in(.m
      Mi

  • @jamess5330
    @jamess5330 Před rokem

    Impressive! It is also super helpful to practice with FAANG engineers at Meetapro through mock interviews

  • @manimaaji
    @manimaaji Před 6 lety +10

    Hi Tushar, Awesome video and thanks for the same. The only minor observation I have is that 3.5 Trillion translates to a 2^42=4,398,046,511, 104 i.e. 4.3 Trillion. So we should be using 42 bits instead of 43 bits from the 128 Bit MD5 Hash ? 43 bits translates to 2^43 = 8,796,093,022,208 i.e. 8.7 Trillion.

    • @osrax
      @osrax Před 5 lety +3

      this is correct, I observed that as well. log(2) of 62^7 = 42

  • @BestURLShortenerBioPageQRCode
    @BestURLShortenerBioPageQRCode Před 11 měsíci +1

    Thanks Tushar, Glad to have you here.

  • @vivekshah6452
    @vivekshah6452 Před 6 lety

    Thanks Tushar your explanation was really simple and it did help me a lot to understand the topic well.

  • @humanintheloop
    @humanintheloop Před 3 lety +2

    Around 9:14 Tushar describes a solution that first puts (short, long) to the database and then reads to verify nobody wrote after it. Doesn't the put corrupt the previous value that someone else left there (and wanted it as the value in the database)?

    • @gsb22
      @gsb22 Před 3 lety

      no, it is insert if not present.

    • @praveensoni5823
      @praveensoni5823 Před 19 dny

      @@gsb22 I don't think so.. "insert if not present" is the second option suggested by @tusharroy2525 not in the third option

  • @biswajitsingh8790
    @biswajitsingh8790 Před 7 lety

    Hurray ! The legend is back!! more design questions please. 😍😍

  • @gkcs
    @gkcs Před 6 lety +53

    That's was enlightening, Tushar. What do you think about probabilistic data structures to generate the tinyUrl? Like the MD5 hash you mentioned, I think we could use Bloom Filters to tackle the problem too.

    • @gkcs
      @gkcs Před 6 lety +3

      Haha, yes I have become interested in bloom filters recently :)

    • @ryc8889
      @ryc8889 Před 6 lety +3

      this was actually one of the things i thought of as well. I think a md5 hash might be too slow for an efficient bloom filter depending on the performance requirements but maybe it can be combined with the CDN idea. i was reading about bloom filters on wiki and there was an example about how akamai uses bloom filters in their webservers
      en.wikipedia.org/wiki/Bloom_filter#Examples

    • @thakur8711
      @thakur8711 Před 6 lety +5

      Gaurav Sen, thanks for bringing up this point. I agree with you that we can use bloom filter to generate tinyURL.
      But there is a caveat. As far as I understand, bloom filter can help you to check if the hashOfBigURL(tinyURL) is already existing on the persistence layer or not. It will help us finding the existence of the tinyURL faster instead of going to the disk & finding it there. The remains part is how would you generate the hashOfBigURL(tinyURL) that remains not clear. Tushar have demonstrated those options to generate. Please correct me if there is any invalid statement I made.

    • @AmreshTripathi
      @AmreshTripathi Před 3 lety +1

      My my we have a design principle celebrity here

    • @kavyashree513
      @kavyashree513 Před 2 lety

      0

  • @akshaysuman8168
    @akshaysuman8168 Před 6 lety +2

    Hi Tushar so good to see you back . Please upload more videos on system design

  • @iceman4154
    @iceman4154 Před 6 lety

    I have actually built a system similar to this some years ago for company located in Germany. I created a conversion/encoder-decoder algorithm that took a unique database integer ID and encoded it to a minimum length string (minimum 3 characters). It used a 64 character alphabet (a-z, A-Z, 0-9, -, _). Using this algorithm it could handle up to roughly 112 quadrillion unique combinations with a 10 digit shortened code. Also, to avoid hard-coding any shortened values or storing them in the persistence layer, it all generated using a unique/primary key index integer value. To decentralize or distribute it, I would use a Casandra type setup where the values are partitioned by unique ID per instances. Everything else would be exactly the same as you have shown here. Great video. Thank you for your time and effort.

    • @tusharroy2525
      @tusharroy2525  Před 6 lety +1

      Great to know about your personal experience. How did you get unique database integerID? Was it from C* sequence number? How much qps your system was doing?

    • @iceman4154
      @iceman4154 Před 6 lety +1

      The system itself never grew into anything major. The investors moved onto other projects before it could be tested at any real scale. The part of the system I built was agnostic to the persistence layer. At that time, I was a somewhat new developer and some of the more senior guys were handling that end of things. I basically just came up with the encoder-decoder algorithm and the architecture behind how the experience went. The senior guys handled the distributed systems and persistence layer of the project. Now if I was to build the system, I would do things differently in a more micro-service oriented way. I would probably have a service for generating unique IDs, a service for converting IDs to shortened character and visa versa, and a service for the handling the redirection/landing page functionality. This would allow me to scale each service as needed, independently but I also am not forced to rely on the shortened values being stored in the persistence layer. I would also use Docker and Docker Swam mode to manage things which was not around back then.

    • @iceman4154
      @iceman4154 Před 6 lety

      This was before any of the nice tools and frameworks were around. We built the system using PHP, using a home-brewed/in-house built web app framework and jQuery for the front-end UI experience, to give you an idea how long ago it was. Probably coming up around close to a decade ago, now.

    • @tusharroy2525
      @tusharroy2525  Před 6 lety

      Thanks for the update.

  • @boomer_money
    @boomer_money Před 4 lety +8

    11:17 so let's say you make your GET request on a newly generated md5 tiny url and the first result's long url doesn't match the long url you just entered (i.e. that tiny url already exists). How do you generate a new tiny url such that if the same link is entered again you'll be able to find the same one? Bit shift and repeat?

  • @doreentyy1936
    @doreentyy1936 Před 3 lety

    Very concise abd clear video.
    Easy to follow.

  • @joanazurundu5377
    @joanazurundu5377 Před rokem

    I really learnt alot on this system design, good job

  • @liuhongqian
    @liuhongqian Před 4 lety +2

    I remember timestamp is 64 bites(8 bytes). "The internal representation of a timestamp is a string of 7 - 13 bytes. Each byte consists of 2 packed decimal digits. The first 4 bytes represent the date, the next 3 bytes the time, and the last 0 - 6 bytes the fractions of a second"

    • @saam6348
      @saam6348 Před 2 lety +1

      With year 2038 problem, a 32-bit can represent Unix time. But it will end after the completion of 2,147,483,647 (2^31 - 1) seconds from the beginning (00:00:00 1 January 1970), i.e., on 19 January, 2038 03:14:08 GMT.

    • @liuhongqian
      @liuhongqian Před 2 lety

      @@saam6348 I would be more than 70 years old by then LOL

  • @ksankitha
    @ksankitha Před 7 lety +1

    I was so so looking forward to your amazing videos ! keep it coming please .

  • @lavida6188
    @lavida6188 Před 6 lety

    Very well explained, Tushar. Liked your presentation and the length of the video.

  • @nabhavlogs371
    @nabhavlogs371 Před 5 lety +2

    Sir with 42 bits, we can generate a total combinations of 4*10^12 and with seven bits about 3*10^12. So why did we take 43 bits and not 42 bits?

  • @kingofwebguru
    @kingofwebguru Před 2 lety

    @Tushar, thanks for the video. Please keep up the hard work.

  • @GK-rl5du
    @GK-rl5du Před 7 lety +4

    My solution :
    Hash the longUrl and persist the hashed value as tinyUrl, the same hashed value can be used as partition key to split the data amongst various db instances.
    To identify duplicate tinyUrls, create an index on the tinyUrl and apply primary key constraint, this will automatically take care of all the data-races that would have occurred if the duplication check logic was written in the application layer.
    Of course, for data reliability I will replicate my hash-partitioned database clusters.
    Again as mentioned in the video I will use a caching layer like redis at the front.

    • @himanshuverma31
      @himanshuverma31 Před 7 lety +3

      if you are hashing long urls following problems will occur:
      1.) What will be your key size? In case of duplicates you have to append your hash with some random value to get rid of the duplicates.
      2.) How are you going to make a read call(redirecting the short url to the original long url)? You will end up making the call to all the partitions which will overload your system.
      Things to keep in mind is, systems like url shortening services are very read extensive so you should be able to handle your read efficiently. I hope it helps!!! :)

  • @AbnerLi
    @AbnerLi Před 7 lety

    Thank you so much Tushar, your videos are always helpful! We really appreciate your work!!

  • @zzzzzzzzzzz6
    @zzzzzzzzzzz6 Před 2 lety

    In a sense using an RDBMS is very similar to the counter from a limitations standpoint -- they scale vertically not horizontally, so you'd have the same single point of failure issue.

  • @yaminil3653
    @yaminil3653 Před 6 lety

    Wonderful, Tushar! Lot of details ! So much passion!
    I'm becoming your fan !!

  • @NiravPatel1989
    @NiravPatel1989 Před 7 lety +1

    toooo goood. I think even the interviewer will also have to refer this as well. Hope to see more. :)

  • @TZCoder
    @TZCoder Před 3 lety

    Just pre generate all the possible urls and insert them in the DB then when a user wants a URL pick an available one at random and give it to them, simple, never a colission. You can even pre-generate as required according to the number used up everyday if you don't want all 3.5 bill in the database taking up space. Stick any used URL in redis so no need to hit DB when accessing the URL.

  • @nixer65
    @nixer65 Před 6 měsíci

    You might want to consider a ring based election algorithm to partition the key space. This would allow you to add more hosts as you scale out.

  • @MrSachintelalwar
    @MrSachintelalwar Před 7 lety

    Hello Tushar, Thank you for the great explanation and making it very simple to understand the problem. Waiting for more designing problems. :)

  • @snigdhakasimahanthi3963
    @snigdhakasimahanthi3963 Před 5 lety +2

    Awesome video Tushar. I don't understand how Approach 3 would work when you put(tinyURL, longURL), get(tinyURL) and check if this is equal to the longURL we have. If there was a duplicate in the DB, won't that get overwritten when I put(tinyURL, longURL) and won't get fetch the new longURL Value?

    • @weihan4573
      @weihan4573 Před 2 lety

      Indeed I have same understanding. The RestAPI put will Update if the key already exist no? So not only this Approach 3 will never end up in "fail and try again" scenario, it will even cause issue as if client try to Get the key, he will get the updated longURL which is not what he wanted.

  • @user-jd2dj4gd2w
    @user-jd2dj4gd2w Před 4 lety

    we can use 6 bits to express 62 characers since 2^6 > 62. So for 7 characters long url, 7 * 6 = 42 bits are enough

  • @wasabinator
    @wasabinator Před 7 lety

    Great to see design interviews. Welcome back :)

  • @anandkulkarni2111
    @anandkulkarni2111 Před 7 lety

    distribute the key generation in different ranges to different workers and manage them using a round robin manager algorithm that will almost always avoid generating a currently used tinyUrl. Worst case is all workers go down and only 1 is left in which case there will be issues and overload on it.

  • @jonahren2649
    @jonahren2649 Před 5 lety +1

    I love your videos, it's so elegant, clean and beautiful

  • @sweetmoment8196
    @sweetmoment8196 Před 4 lety

    I really liked the idea behind the zookeeper

  • @rplusgdj
    @rplusgdj Před 6 lety +11

    16:36 , can we reduce collisions by considering timestamp along with milliseconds?

    • @Himanshu-ed3mf
      @Himanshu-ed3mf Před 4 lety +5

      but that would increase the bits needed for the timestamp. from 32 to 48.

  • @SuperKako17
    @SuperKako17 Před 4 lety +2

    What is the advantage of the 3rd approach versus the 2nd? I fail to imagine a scenario in which it would be better suited than the second one, which I find elegantly simple.

  • @HiRyanChen
    @HiRyanChen Před 6 lety +6

    What happens if in the actual interview you say, "This is 62^7, and this is about 3.5 trillion"

    • @tusharroy2525
      @tusharroy2525  Před 6 lety +13

      Lol. I dunno. Interviewer will think you are well prepared or great mathematician

    • @raghureddy3237
      @raghureddy3237 Před 6 lety +3

      you should say it is 2^(7log62).

  • @tongzhu8999
    @tongzhu8999 Před 7 lety

    great guy is back! more system design video please.

  • @bcjitu
    @bcjitu Před 3 lety +1

    why only 43 bits ? Based on which logic ?

  • @maazrehman8723
    @maazrehman8723 Před 4 lety +1

    Why not just use auto-incremented id of a url in database to save these records and decode them in runtime ?

  • @nands4410
    @nands4410 Před 7 lety +5

    WoW You're Back!

  • @basharattamboli3979
    @basharattamboli3979 Před 4 lety

    # Tiny URL
    *requirements
    - Given a longer URL get a shorter URL
    - Given shorter URL get its corresponding longer URL
    - It is obvious that when this question is asked, it doesn't mean that interviewer wants you to given him an answer like "use map and keep key value pair"
    - Interviewer wants to understand your knowledge about scaling and durability and how you will generate a unique URL.
    This is a read heavy system as tinyURL is accessed more times than it is created hence persistence used i.e DB should be good at handling faster reads.
    1. When designing a service think about three things straight away
    1. API
    - CreateTiny(LongURL)-> shortURL
    - GetLong(shortURL)-> longURL
    2. Application layer
    - How will you find out a shortURL for a longURL
    3. Persistence layer
    - how/where you will keep shortURL and longURL.
    2.BLOCK DIAGRAM
    1. REST APIfor communication
    2. load balancer - front end for service. does delegation of the request to one of the worker threads.
    3. work threads - take URL generate the shortURL and keep in persistance layer
    4. caching - memcache, redis (if this tinyURL is created then it will be post somewhere and people will be using this for this keeping it in some distributed cache for faster access.)
    5. persistence layer - DB
    3. Generate shortURL
    - we can have a-z, A-Z, 0-9 chars total of 62 char.
    - if shortURL is 7 chars then we can get 62^7 combinations
    - which is equal to 3 trillion which is enough if your service is serving arround 1000 request/s
    4. Table schema
    - key as a tinyURL and value as longURL
    5. Techniques to generate shortURL
    1. Generate random tinyURL and check the DB
    - worker thread generates a random tinyURL and checks if it is present in DB or not, if not then both tinyURL and longURL are kept in DB
    - for every put we will have to do one or more get
    2. pick first 43 char of MD5 hash
    - we calculate MD5 for longURL and take first 43 bits and call it as tinyURL.
    - MD5 is hashing function , generate 128 bit hash
    - if more bits are considered less probability of collision
    - less bits considered gives you shorter URL but more chances of collision
    - no need to check if tinyURL in DB it will be Unique
    - how 43 bits and converted into 7 chars
    - 43 bits are of our 62 chars
    - convert this 43 to a decimals and then convert it to 62 base number then you will get number from 0-61 then we will map these to our chars
    3. counter based(dont like this approach no randomness)

  • @achyutakrishna
    @achyutakrishna Před 6 lety

    Dear Tushar, great topics and great details. One video on sms and email enrollment design pattern. Design options and best practices.

  • @deepakpatra1411
    @deepakpatra1411 Před 6 lety +2

    Hi, Tushar I have a question for your 3 technique 1.if you are putting the newly generated value in DB and then in 2nd step getting the value against it verify against your long URL(it should override the value at same key and result is always same you will lose the old value ). will not it match always?

    • @joshuamorning8250
      @joshuamorning8250 Před 2 lety

      I have the same concern as you. I think the third approach should have some strategies to prevent this situation. Otherwise, the old value will be wrritten.

  • @anantsaksena1989
    @anantsaksena1989 Před 3 lety +1

    Hi, In the MD5 hash approach:
    How can we be sure that for a 7 char length short url , we need to take 43 bits out of 128 bits of the MD5 hash of long url, convert that to decimal and then to base 62. And once we convert it to base 62 it will give a 7 char length short url. What is the mathematics behind the 43 out of 128 bits approach? MY doubt is why just 43 ?
    Thanks.

    • @rajeshg3570
      @rajeshg3570 Před 2 lety

      Hi @Anant Saksena, Here is the mathematical formula for this - log(2) of 62^7 = 42 .... which means, based on the no of URLs to be generated per sec, we have assumed that if we take base 64 encoding, then you can generate 62^7 URLs ( URL length would be 7 chars). Since MD5 generates binary value, the equivalent number is 2^42. Hence the no of bits required for this calculation is approximately 42

  • @yeniaraserol
    @yeniaraserol Před 6 lety

    The best system design video ever!

  • @AI6NET_link_shortener
    @AI6NET_link_shortener Před 4 lety

    Of course it takes longer than 30 minutes and This is only the start of a very, very long journey after design you have the real-world application, in our instance we found that for every 10 users that used our link Shortener in the context intended and long after the design stage 8 of these were usually hackers or email spammers, we then went back to the drawing board and built a malicious tracking system on top of our url Shortener and it’s still not 100% with new users surprising us every day.

  • @Cosciug1234
    @Cosciug1234 Před 7 lety

    Like video a lot, hoping to see more service design videos in the future

  • @justjaks
    @justjaks Před 7 lety

    Tushar. I really missed you. Hope you can shed sometime in your busy schedule and create more videos like this

  • @ruhinapatel6530
    @ruhinapatel6530 Před 5 lety +1

    Amazing video and explanation.. u got one more subscriber

  • @AmreshTripathi
    @AmreshTripathi Před 3 lety

    Excellent video, really well done. However at 12:00 you discuss saving DB space, I think it isn't that important now since storage is really cheap. So its better to focus on saving the additional processing.

  • @sc1444
    @sc1444 Před 6 lety

    @Tushar Roy - Thanks for such helpful videos. Please keep doing more...

  • @mrincodi
    @mrincodi Před 6 lety

    So I think that in the technique no. 3 in 9:12, the second steps should be "put if absent", not simply "put". Otherwise, we may be overwriting a previous value.

  • @rajatpawar9465
    @rajatpawar9465 Před 7 lety +1

    Hey Tushar, thank you very much for sharing your knowledge. I'd like to request a discussion on how media (images & videos) is managed by big companies (youtube, imgur, etc) and how it is organized. Thanks!

    • @83rossb
      @83rossb Před 6 lety

      look up content delivery networks (CDNs) and consistent hashing

  • @VeganCheeseburger
    @VeganCheeseburger Před 6 lety

    Brilliant video! Well explained and perfectly paced.

  • @jineshdhruv6151
    @jineshdhruv6151 Před 6 lety

    Thank you for awesome explanation. Please post more videos for design questions.

  • @veryconfuseduser
    @veryconfuseduser Před rokem +1

    Would be better if you could go into more detail about assigning the ranges. I guess to experts already familiar with the strategy it would be easy to understand but for someone who's learning of this the first time it's hard to wrap one's head around it

    • @HarshitKumar-dj4ev
      @HarshitKumar-dj4ev Před měsícem

      Exactly. He just glossed over MOST of the details. I highly appreciate his work but for me who is learning about this for the first time, I'm just lost as to how do these actually work behind the high level design

  • @59sharmanalin
    @59sharmanalin Před 3 lety

    If database was RDBMS, keeping short url column in database as unique saves you from having to do a get. Nosql you can use primary key as short url in key val db like dymano

  • @michaeldang8189
    @michaeldang8189 Před 5 lety +2

    For the first 43bit MD5 solution, what do you do if you encounter a collision with 2 different long URLs, i.e., there are 2 totally different long URLs that has the same first 43bit MD5 sum.

    • @pllx2
      @pllx2 Před 5 lety

      Perhaps we could pluck some parts of the counter solution, e.g., take 30 bits of the MD5 + the 16 latter bits of the timestamp (since first 16 are much slower to change) + 7 bits randomly generated or by counter?

  • @rajmilan2933
    @rajmilan2933 Před 7 lety

    Amazing !! You are like my guru dude learnt so much from you.

  • @glaive001Edits
    @glaive001Edits Před 3 lety

    I don't get it for the put tiny url into db part, what's the difference between solution 1 and solution 2? they both check if a tiny url exists in db and if not then put the pair into it

  • @vinayak.raghuvamshi
    @vinayak.raghuvamshi Před 3 lety

    Nice video. The MD5 was not the best idea IMHO especially given that you are using only the first 43 bits, increasing the probability of collisions. Some interviewers will get all fussy about this choice. I would rather have the candidates stick to the counter method and maybe dive deeper into how we can design a distributed counter.

  • @chan625
    @chan625 Před 5 lety +1

    I have 2 doubts, perhaps stupid as am no techie -
    1. If PUT is used first, would it not already update, what's the point of using GET later to validate? Shouldn't that be READ/GET first and if no match then PUT?
    2. When comparing random method with MD5 hash, you said random method will cause duplicate rows for same URL. However we've already discussed the validations in that method earlier
    Thanks

    • @praveensoni5823
      @praveensoni5823 Před 19 dny

      @chan625 Agree with point#1 with a caveat.. what if when you performed GET, the value was not present, and it was added by another thread by the time you did PUT. GET and PUT always risky in absence of locking

  • @DipeshKumarYadavDKY
    @DipeshKumarYadavDKY Před 7 lety

    Great AWesome Bro to see you back.

  • @mayurmahajan1503
    @mayurmahajan1503 Před 3 lety

    Tushar sir, did you record this in Ashish sir's class location? I used to attend Ashish sir's class those days.

  • @sushantsharmapec
    @sushantsharmapec Před 7 lety +1

    Great Video. Can you talk about the retrieval part? Specially given a long url, see if we already have a url for it and give that back.

  • @biboswanroy6699
    @biboswanroy6699 Před 4 lety

    For the range-based approach url length would be greater than 7characters

  • @Abhimalviya
    @Abhimalviya Před 7 lety

    Thanks for coming back sir

  • @vanshthukral5477
    @vanshthukral5477 Před 9 měsíci

    Thanks for creating such amazing content!

  • @sfms952
    @sfms952 Před 6 lety

    Database table containing original URL, shortened URL, maybe an expiration date, and hit count. Shortener takes long URL, if the long URL exists in the database, returns the short URL, else it generates a random string of characters that don't exist in the table, inserts into the table. Web application router takes the shortened URL, checks if it exists, if it exists, redirect to long URL (maybe update hit count). If it doesn't exist (or is expired), redirect to generic error page. If creating expiry URLs, have a stored procedure run in a job that deletes (and maybe insert into historical table) expired URLs.

  • @Sandeepg255
    @Sandeepg255 Před 5 lety

    but the range based approach discussed at 22:19 will generate multiple short urls for same url on multiple create requests.

  • @256cool
    @256cool Před 2 měsíci

    Won't the counter/all-host/range_based approch generate duplicate tiny urls? With these approaches how do I search if a long url already exists? Won't a DB table [long -> tiny] also be required to avoid duplicate URL generation?

  • @arunimatechnologies3394

    In the case of random string generation and using third technique of multiple get and put the case could corrupt data.You see the value doesnot exist and try to put your data meanwhile another worker thread reads in parallel and see the empty value and puts its data first.So the database has second value.Now when you will put your own data and thus the data of the other thread gets corrupted.Optimistic locking is needed here.