Web Crawler - System Design Interview Question

Sdílet
Vložit
  • čas přidán 27. 07. 2024
  • This is a solution to the classic web crawler system design interview question. It addresses the main problems most interviewers would want to see handled, as well as discussing additional areas that may be discussed in the interview.
    ⏰ Time Stamps ⏰
    0:00 Use cases
    0:42 Requirements
    1:15 Estimates
    3:06 Architecture overview
    9:06 URL frontier
    11:21 System flow
    12:26 Additional discussion points
    Preparing for a technical interview?
    👉 Checkout techprep.app/yt to nail your next interview

Komentáře • 11

  • @Robloxgod4
    @Robloxgod4 Před 5 měsíci +5

    The CZcams algorithm has picked up your channel. Really good content

  • @LouisDuran
    @LouisDuran Před 2 měsíci

    I like that these are short and sweet. It shouldn't take an hour to explain TinyURL or web crawler. Thanks!

  • @SirDrinksAlot69
    @SirDrinksAlot69 Před 4 měsíci +1

    Hashes. You can even halve them for example and so long as the interviewer doesnt have any rules around specific length then add digits until it clears, can do things to make that fast as well. Hashes also help obfuscation so it's harder to scan and obtain the short urls and it makes looking up duplicates easier.

  • @rajaryanvishwakarma8915
    @rajaryanvishwakarma8915 Před 5 měsíci +1

    Great video man

  • @ChimiChanga1337
    @ChimiChanga1337 Před 5 měsíci +1

    Excellent! Could also talk about what kind of network protocols will be used for services to talk to eachother?

  • @LearningNewThings0407
    @LearningNewThings0407 Před 3 měsíci +1

    Is it Font queue prioritizer or Front queue prioritizer ?

  • @WINDSORONFIRE
    @WINDSORONFIRE Před měsícem

    How does the design of a web crawler not include geo located servers etc?

  • @jjlee4883
    @jjlee4883 Před 5 měsíci

    Awesome video. Would it make sense for the url seen detector and url filter to come after the html parser step?

    • @TechPrepYT
      @TechPrepYT  Před 5 měsíci

      Thanks for the comment! You wold want the duplicate detection to occur directly after the HTML parser as we don't want to process the same data and extract the same URLs from the same page and that's why the URL Seen Detector and URL filter happen later on in the system. Hope this makes sense!

  • @dibll
    @dibll Před 4 měsíci

    During duplicate detection step, how Content Cache is being used? Could someone please explain?