Caching Pitfalls Every Developer Should Know

Sdílet
Vložit
  • čas přidán 6. 03. 2024
  • Get a Free System Design PDF with 158 pages by subscribing to our weekly newsletter: bit.ly/bytebytegoytTopic
    Animation tools: Adobe Illustrator and After Effects.
    Checkout our bestselling System Design Interview books:
    Volume 1: amzn.to/3Ou7gkd
    Volume 2: amzn.to/3HqGozy
    The digital version of System Design Interview books: bit.ly/3mlDSk9
    ABOUT US:
    Covering topics and trends in large-scale system design, from the authors of the best-selling System Design Interview series.
  • Věda a technologie

Komentáře • 58

  • @IceQub3
    @IceQub3 Před 3 měsíci +62

    Cache invalidation is the elephant in the room

  • @Kingside88
    @Kingside88 Před 3 měsíci +14

    I really appreciate your video. They are so high quality with the best explanation.
    Can you please make a video of what is the best strategy for using a relational database like Microsoft SQL or what else together with elastic search.
    How to keep them synchronized? Thank you in advance

  • @ANONAAAAAAAAA
    @ANONAAAAAAAAA Před 3 měsíci +44

    The most important knowledge about caching is: don't cache unless it's absolutely necessary.
    Also, database in conjunction with read-replicas can be much more resilient and performant than your homebrew crappy caching mechanism.
    Caching is final resort, after you've tried everything you can do with database, for example, tuning queries, adding indexes etc.

    • @build-your-own-x
      @build-your-own-x Před 3 měsíci +4

      this should be the first lesson in every caching learning resource.

    • @RodrigoVillarreal
      @RodrigoVillarreal Před 3 měsíci +2

      This only applies if we are doing simple selects / batch data. Read replicas don't solve issues like time consuming queries (expensive joins / aggregations / etc). Of course, you clearly specified: "Unless it's absolutely necessary" :)

    • @Osono2diWorld
      @Osono2diWorld Před 3 měsíci

      You’re not building a scalable system without caching.

    • @wilfredv1930
      @wilfredv1930 Před měsícem

      there should be a pretty deep evaluation to understand if caching is necessary or not. failing in that would lead to disaster sooner or later

  • @dave6012
    @dave6012 Před měsícem +4

    All I know is there are 2 hard things about programming:
    - naming things
    - cache invalidation
    - off-by-one errors

  • @michaelkhalsa
    @michaelkhalsa Před 3 měsíci +3

    Caches work well in tandem with claims, that is some data should never be cached, but rather claimed by a user during the editing process, with ability for another user to revoke the claim. Saves always check claim status first.
    Multi level cache invalidating is important. For example, products returned for a catalog can use a product cache with a polled cache invalidating process. Yet, for display on a product detail page a quick check of a timestamp is done which is reset if any of the various product tables are updated by a trigger. Thus a quick scaler db query ensures the cache is valid, which also helps protect the cache itself from becoming stale.
    This can be taken even further with a dedicated timestamp table for complex product information. This approach dramatically improves performance, always guaranteeing valid data on a detail page, while keeping catalogs valid within the timespan of the cache manager for polling a cache invalidation table for changes.

  • @maxvaessen
    @maxvaessen Před 3 měsíci +1

    Very useful! Thank you ❤

  • @mailbrn78
    @mailbrn78 Před 3 měsíci

    Thanks for the insights on cache management. Could you plz suggest keys in cache need encryption? What should be the key flush time.

  • @rogers.1228
    @rogers.1228 Před 3 měsíci +2

    Use jitter to the TTL to reduce cache avalanche and many related issues

  • @toxicitysocks
    @toxicitysocks Před 3 měsíci +11

    Would love to see you tackle cache consistency too: what happens when the database write succeeds but the cache write fails? Or if the database is written concurrently to 2 different values but the last write to the database was value a, while the last write to the cache was value b? Now the cache is forever inconsistent.

    • @wesamadel3612
      @wesamadel3612 Před 3 měsíci +1

      remove cache entry on every update

    • @toxicitysocks
      @toxicitysocks Před 3 měsíci

      @@wesamadel3612 sure, but what if there’s a network failure getting to the cache after you do the update in the db?

    • @eNITH24a
      @eNITH24a Před 2 měsíci +1

      Search the dual write problem. Using some form of event driven system, like a CDC (change data capture) in your db to write to an apache kafka stream. That way, the log is persisted in the queue. Say, the cache goes down, the event will still be in the queue and can be read when the cache is back up. However, I'm not sure what you would use to source data from kafka to redis.

    • @toxicitysocks
      @toxicitysocks Před 2 měsíci

      @@eNITH24a yeah, that’s the best solution I’m aware of. I just wonder if there’s any other way besides CDC or two phase commit. As for getting data from Kafka to redis, you could write a simple consumer service that reads the events and writes to redis.

    • @eNITH24a
      @eNITH24a Před 2 měsíci

      @@toxicitysocks I figured you could write a consumer service, but then doesn't that introduce another source of failure? Or I guess it's a lower source of failure if all it does is push to redis from Kafka.

  • @vladyslavlen9490
    @vladyslavlen9490 Před 3 měsíci +2

    What if we add some kind of jitter for the key TTL, so we minimize the probability of having them expired at the same time?

  • @simonbernard4216
    @simonbernard4216 Před 3 měsíci +2

    at 0:27 on the left diagram, shouldn't the process order be :
    1. Data request
    2. Data response (no cached data)
    3. Read original
    4. Copy cache

  • @BigHalfSteps
    @BigHalfSteps Před 3 měsíci +2

    I'm surprised that there is no mention of a easy solution (albeit there might still be an issue when starting from a cold cache) for a Avalanche/Stampede: Just use different caching times. That should somewhat alleviate when the database is hit with multiple requests. But in essence, only cache what is necessary.

  • @micahpezdirtz8196
    @micahpezdirtz8196 Před 3 měsíci

    How can there be less than 1m subscribers to your channel? You have the best explanations

  • @devid3085
    @devid3085 Před 3 měsíci +1

    At 3:58 the "find key" arrow has a typo and it should be (3) instead of (4)

  • @raj_kundalia
    @raj_kundalia Před 3 měsíci

    thank you!

  • @NemiroIlia
    @NemiroIlia Před 3 měsíci +1

    2:33 a hidden smiling gem at the bottom right

  • @eduardokuroda8586
    @eduardokuroda8586 Před 3 měsíci +1

    to use a bloomfilter it sounds like easy, but it can't delete element yeah? will it need to refresh the bloomfilter after something to ignore deleted itens?

  • @kazama81
    @kazama81 Před 3 měsíci

    Question:
    What's the point of a cache server? Why the server itself / webserver is not doing the caching?
    If caching is supposed to be for fast retrieval, if we store it in a different server, won't the network call take more time than querying a db in the first place?

  • @nixjavi7220
    @nixjavi7220 Před 3 měsíci

    Amazing vídeos

  • @parthi2929
    @parthi2929 Před 2 měsíci +1

    If something creates traffic, have a traffic signal, ok.. a lock here... regulate..
    If something creates traffic, have multiple systems (web server/cache server etc) to handle it..
    If something can fail, have redundant backups of THAT..
    this applies to anything..
    also.. to know if cache could fail in DB caz relevant answer not in DB, note down that before in some way..

  • @PranaySoniKumar
    @PranaySoniKumar Před 3 měsíci

    Can't we use request collapsing to prevent stampede? As it maily due to expired cache entry and multiple requests are trying to access the same resource?

  • @user-je9fw2rl5s
    @user-je9fw2rl5s Před 3 měsíci

    What use to for presentation and demonstration, please ?

  • @ANONAAAAAAAAA
    @ANONAAAAAAAAA Před 3 měsíci +14

    The scariest story I've head about caching is, pages containing user's private info are cached on CDN, then they forgot to include session id in the cache keys...

    • @StoCoBoLoMC
      @StoCoBoLoMC Před 3 měsíci

      What is CDN? Why can't info be cached on it?

    • @venberd
      @venberd Před 3 měsíci

      Content Delivery network - Vendor operated nodes geographically located closer to user. Very little isolation guaranteed. Don’t want to put your users data permanently there

  • @gus473
    @gus473 Před 3 měsíci

    Comment for algorithm, thank you! 😎✌️

  • @chrishabgood8900
    @chrishabgood8900 Před 3 měsíci

    Could the db invalidate the cache?

  • @krazeemonkee
    @krazeemonkee Před 3 měsíci +3

    “cache me outside, how bout dat”

  • @zixuanzhao6043
    @zixuanzhao6043 Před měsícem

    what about consistency

  • @rizthetechie
    @rizthetechie Před 3 měsíci

    Just wondering if proactively cache the pages on expiry again, why put expiration ?

    • @TyllerJorEl
      @TyllerJorEl Před 2 měsíci

      data may change; also, if delete-key events randomly go missing (it could happen for a myriad of reasons), stale data could pile up forever and fill up the memory

  • @titogorla
    @titogorla Před 3 měsíci

    love it

  • @franklinoladipo2343
    @franklinoladipo2343 Před 3 měsíci

    Was thinking he would talk about Cache Invalidation

  • @BenjaminMeasures
    @BenjaminMeasures Před 3 měsíci

    There are only two hard things in Computer Science: cache invalidation and naming things.
    -- Phil Karlton

  • @uchuuowl7605
    @uchuuowl7605 Před měsícem

    GOLD

  • @MrSofazocker
    @MrSofazocker Před 3 měsíci

    Hear me out.
    What if you don't ever hit the DB, but the Cache either halts or returns a null value for all requests, and singularly fetches the data from the DB.
    Then answers all suspended requests!?

  • @rishiraj2548
    @rishiraj2548 Před 3 měsíci +1

    👍

  • @jerichaux9219
    @jerichaux9219 Před 3 měsíci +1

    Obligatory zeroth.

  • @navinmittal4809
    @navinmittal4809 Před 3 měsíci

    In summary, I think he explained 3 scenarios when there's a huge load: cache miss on single key (stampede), non-existent key (penetration), bulk cache miss on multiple/all keys (avalanche).

  • @aruneshprabu7925
    @aruneshprabu7925 Před 3 měsíci

    ❤❤❤❤

  • @chologhuribangladesh7792
    @chologhuribangladesh7792 Před 3 měsíci +2

    Most people do not care about caching. They cache whatever they want.

  • @atifadib
    @atifadib Před 3 měsíci

    Cache Crash

  • @ChuckNorris-lf6vo
    @ChuckNorris-lf6vo Před 3 měsíci +4

    Hi bro can you explain complex solutions with a little bit more LIFE in the voice? Like give it MEANING give it energy. Solutions are important bro. Be more alive so it is crystal clear, why a certain solution is being done even if it is very complex. Thank you. And make the video longer if you have to but make it the best god damn explanation of the problem. Thanks.

    • @Geza_Molnar_
      @Geza_Molnar_ Před 2 měsíci +1

      I'd like to watch short (beginner : 5-15 mins) - medium (advanced : 10-25 mins) - long (professional : 30-60 mins roughly) videos about the topics which are on the channel. I'm fine with the content, that's already enough 'energy' for me. I have my motivation to listen, to think about that, to hit the back button when I need repetition or more time for thinking and understanding (no need for more 'life in the voice' for me to push me).

  • @wishmeheaven
    @wishmeheaven Před měsícem

    Is this video sponsored by Redis?

  • @vadymk759
    @vadymk759 Před 2 měsíci

    There are only two hard things in Computer Science: cache invalidation and naming things.
    -- Phil Karlton