Denormalizing DB for Justin Bieber

Sdílet
Vložit

Komentáře • 484

  • @rayforever
    @rayforever Před 5 měsíci +1355

    It's such simple thing but still a franchise level industry wont take it as secure or in options

    • @carlosmspk
      @carlosmspk Před 5 měsíci +138

      Because you can easily get into scenarios where "total_likes" doesn't actually reflect the number of likes

    • @crysist13
      @crysist13 Před 5 měsíci +30

      And no longer check who liked the post, just the number of likes. Meaning the list of who liked what would need to be kept somewhere else.

    • @snakefinn
      @snakefinn Před 5 měsíci +16

      ​@@carlosmspk If posts couldn't be un-liked it would be a lot simpler

    • @Rhidayah
      @Rhidayah Před 5 měsíci +49

      Race Conditions be like 🗿

    • @carlosmspk
      @carlosmspk Před 5 měsíci +29

      @@crysist13 The idea is not to stop adding tuples to the "Likes" table, it's just that you don't count them for displaying total likes. That part wouldn't be an issue, you could still get who liked what

  • @ianbdb7686
    @ianbdb7686 Před 5 měsíci +654

    For people saying displaying incorrect numbers of likes, youtube does view count, the transactions are queued, and race conditions are not a big deal from a users perspective

    • @usernamesrbacknowthx
      @usernamesrbacknowthx Před 5 měsíci

      Yeah I can’t believe people are arguing about accuracy lmao. Who cares. Goes to show you most programmers are GEEKS who can’t actually problem solve

    • @gabrielpedroza1346
      @gabrielpedroza1346 Před 5 měsíci +7

      wdym race conditions are not a big deal from a users perspective? i’m pretty sure the reason why is because they value more the availability of certain parts of youtube, like maybe the core system and like count, and consistency of certain parts, like the comments system. when a user would visit a video, it would hit a node (preferably the closest node) and that node might not have the most up-to-date information which means it’s eventual consistency

    • @Kasukke
      @Kasukke Před 5 měsíci

      @@gabrielpedroza1346users are fine with eventually consistent like counts

    • @frenchButt44
      @frenchButt44 Před 5 měsíci +65

      ​@@gabrielpedroza1346 it doesn't matter if a video gets 1001 views or 1000 views, the discrepancy is because of "eventual consistency" so it will eventually be the same as the count even it is updated in other shards.
      The benefit's are outweigh the drawbacks so much that big companies are using that. I don't see you being an architect there.

    • @titan5064
      @titan5064 Před 4 měsíci +1

      ​@@frenchButt44 Idk exactly how yt work but does it go like "you press like, the client side interface updates what should be the eventual number and then queues a transaction to update on the db"?

  • @christophcooneyoff
    @christophcooneyoff Před 5 měsíci +338

    Eventual consistency is a fair trade off for performance.

    • @oggassaggaoggaffa
      @oggassaggaoggaffa Před 4 měsíci +26

      But ONLY for data/rows/fields that aren't mission critical...such a "likes" counts! 😁 If you're writing an airline reservation system or stock trading program "eventual" doesn't really cut it.

    • @christophcooneyoff
      @christophcooneyoff Před 4 měsíci +23

      @@oggassaggaoggaffa No, this is why atomic transactions exist 😬

    • @lukeP01
      @lukeP01 Před 3 měsíci +2

      Probably still not consistent counting the rows as you need to lock the table

    • @Fennecbutt
      @Fennecbutt Před 2 měsíci

      Eventual consistency is only okay for this model, one producer many consumers, where the content isn't liable to ever change much. Using it for something with state that changes a lot over time is always a terrible idea imo

    • @christophcooneyoff
      @christophcooneyoff Před 2 měsíci

      @@Fennecbutt Yes.

  • @lefteriseleftheriades7381
    @lefteriseleftheriades7381 Před 5 měsíci +1096

    The fact that they even counted shows they never thought that thing would scale to more than 100k users

    • @marcialabrahantes3369
      @marcialabrahantes3369 Před 5 měsíci +260

      you don't optimize prematurely when building a startup

    • @MungeParty
      @MungeParty Před 5 měsíci

      ​@@marcialabrahantes3369this guy gets it

    • @seighartmercury
      @seighartmercury Před 5 měsíci

      ​@@marcialabrahantes3369 When optimization is literally just 1 line of code, you absolutely should

    • @choilive
      @choilive Před 5 měsíci +73

      Yep. There’s a good chance they might not ever have gotten 100K users. Why worry about good problems to have?

    • @flodderr
      @flodderr Před 4 měsíci +14

      @@choilive I mean adding the total_likes seems like the obvious thing to do from the start. Insta had to have a business plan when they started out so yes they should have anticipated 100k

  • @wywarren
    @wywarren Před 5 měsíci +80

    It’s slightly a bit more complicated for instagram, where in order to drive engagement, it also shows which one of the people you’re following also likes the post. So other than the count which is global, there’s an aggregate query contextual to the logged in user’s first connections to the post.

    • @WeirdDuck781
      @WeirdDuck781 Před 3 měsíci +1

      Welp they can select top 3 from the likes table which should be rather fast

    • @MGraczyk
      @MGraczyk Před 2 měsíci +1

      The most expensive part on the read side is showing the preview comments on the post, but there are fallbacks for very high volume posts and during times of high overall usage. I worked on ranking for these surfaces

    • @ianhpete
      @ianhpete Před 2 měsíci +1

      Just query in the friends like list for the last cpuple days.
      And you take just the 5 friends with the most engagement or which communicate with the user the most

    • @wywarren
      @wywarren Před 2 měsíci

      @@ianhpete That seems like a good heuristic to start with but that would also assume that those n friends (in this case 5) would have had to have engaged with said post. If they didn't you'd have to widen your circle to a bigger subset and then that becomes a unique set for each post.

    • @w花b
      @w花b Před 2 měsíci

      ​@@wywarrenwhich it is

  • @Hobson474
    @Hobson474 Před 2 měsíci +83

    This is so untrue. Relational databases a very good at counting rows and will typically not even read the table, instead using btree index to count. What he should say is the hash join of two tables is slow.
    This is actually bad advice, UPDATE is a very expensive operation on a key table and will generate logging data to deal with concurrency. Ideally there should be a seperate post_statistics table which can be unlogged for high performance write while trading off durability

    • @UpToTheWhalesNow
      @UpToTheWhalesNow Před měsícem +15

      yes... it's a real problem but his "solution" is in fact worse than the original

    • @csbshaw1
      @csbshaw1 Před měsícem +9

      You raise some valid points about the efficiency of counting rows using indexed columns and the expense of update operations. I completely agree. I was happy to see btree mentioned in an Instagram comment, lol.
      While btree indexes can indeed speed up count operations, this efficiency diminishes as the dataset grows and queries become more complex. Instagram deals with massive volumes of data and very high traffic, which means that even indexed counts can become a bottleneck.
      I personally would use in memory caching (redis) and a message queue like Kafka. Also probably batch processing and composite indexing.

    • @xutiTrinh
      @xutiTrinh Před 25 dny

      His foreword is so misleading.

  • @aquiace
    @aquiace Před 2 měsíci +10

    It makes me feel like a less bad programmer knowing this was the solution

  • @jackdavenport5011
    @jackdavenport5011 Před 5 měsíci +26

    Some databases allow you to do something like that automatically, so when a like row gets inserted/removed it automatically updates the count. Or you could just run the count operation every 1-2 minutes and cache the result in the post row :)

  • @Bcs-Mohtisham
    @Bcs-Mohtisham Před 5 měsíci +109

    Reminds me about a nosql rule ,
    "Store related data together in a single structure to optimize retrieval, a principle particularly emphasized in NoSQL databases for efficient querying."

    • @user-by6fp4ov3k
      @user-by6fp4ov3k Před 5 měsíci +3

      That's called "embedding"

    • @yuriiholenko6764
      @yuriiholenko6764 Před 5 měsíci +5

      video starting from "counting rows on RELATIONAL database", why you mentioning nosql rules?

    • @w花b
      @w花b Před 2 měsíci

      The term NoSQL is so misleading

  • @algeriennesaffaires7017
    @algeriennesaffaires7017 Před měsícem +4

    When I created an ecommerce websites in app. First time I was like. Okay, I'm just gonna create a column and name it. Total likes this way. I'm just gonna show how many likes this product get but later for marketing reasons, I need to know who liked this product so we can send a marketing emails so I had to create now on you table and name it liked by user. ID, for example, so if we want to do marketing email about some products. We just go to that table and find who like that product and send the marketing email. So I suggest you do both solution. Create a separate table and save the ideas of the users who like that post or that product and for speed reasons. Also, add a total likes in the products. List or product post this way. You will have everything for you. It may make the data base bigger. Because now you have a big another table

  • @ololh4xx
    @ololh4xx Před 5 měsíci +9

    counting rows has never been slow. Its basically a core feature of almost all relational DB systems to offer ultra-fast counting. *BUT* : counting several thousands of times per second (!) introduces racing conditions and counting algorithm overhead which can become absolutely deadly for performance because of suboptimal locking or incorrect usage of mutexes, semaphores and the likes. Its not the counting itself killing the performance, its the low scalability of it. There are databases out there which offer ultra-fast counting which is also scalable. In some cases, this is realized via auto-updates of internal count fields, some use dark math wizardry and some are simply that fast by design.

    • @phill6859
      @phill6859 Před 4 měsíci

      Counting rows has always been slow. Though this problem would be counting indexes.

    • @ololh4xx
      @ololh4xx Před 4 měsíci +2

      @@phill6859 i recommend looking into current (and past) database technologies. Performance problems regarding row-counting have been solved for *decades* now. You need to get with the times, expand your knowledge and try out new stuff. It isnt slow anymore - and it hasnt been for decades. Except in cases where the database has been misconfigured or is being misused.

    • @FilipCordas
      @FilipCordas Před 2 měsíci

      This sounds like a materialized view problem.

    • @Xcess007
      @Xcess007 Před měsícem

      Precisely. Elasticsearch used HyperLogLog to trader off a bit of accuracy for lightning fast counting and fixed size memory usage.

    • @zaza-ik5ws
      @zaza-ik5ws Před měsícem

      Oh my god, your comment blew my mind. Master, teach me your ways.

  • @fullstack_journey
    @fullstack_journey Před 5 měsíci +1

    Thanks for your music, and SQL query optimisation, Freddy!

  • @georgeyoung2684
    @georgeyoung2684 Před 5 měsíci +5

    Read optimised vs write optimised

  • @ckennedy0323
    @ckennedy0323 Před 5 měsíci +60

    Now tell me what happens when 1000000 people like the photo nearly simultaneously

    • @rakinar2
      @rakinar2 Před 5 měsíci +4

      Race condition, if not handled correctly

    • @NabeelFarooqui
      @NabeelFarooqui Před 5 měsíci +12

      Simultaneous likes unlikes will always be significantly less than simultaneous views

    • @SamMeechWard
      @SamMeechWard  Před 5 měsíci +68

      Queue the updates and if it takes a while no big deal, because no one cares if the like count is off by a bit

    • @SamMeechWard
      @SamMeechWard  Před 5 měsíci +17

      Low steaks though. You could avoid the race condition or just account for it later by updating the value directly from the count

    • @lozyodella4178
      @lozyodella4178 Před 5 měsíci +4

      Just serialize it, no one cares if your like isn't instantly reflected on the app

  • @JTCF
    @JTCF Před 2 měsíci +1

    It may seem as an unreliable system which might produce inaccurate results, but with proper structuring you won't be able to mess it up. And it saves a lot of resources, too.

  • @chrisdiehl8452
    @chrisdiehl8452 Před měsícem +1

    It depends on the system, and environment you are working with.
    If your count is slow on the database, you can change the operation from a post to a present operation.

  • @haxney
    @haxney Před 2 měsíci +2

    It's more complicated than even that. As others have noted, you don't need (or want) to display a perfectly up-to-date like count, so it can be eventually consistent. The other important factor is despamming. A user clicking "like" and the like counter incrementing is not a single, atomic operation. If the user is identified as a spam bot, then their like should not count for the purpose of promoting the post.
    The real workflow is more like:
    1. User clicks "like." Store that.
    2. Start a batch despam pipeline to figure out whether the like is legitimate.
    3. If so, update the like count on the post.
    The despam pipeline is going to take some time. In this case, nobody but spammers actually cares about accuracy down to a single like, so lazily updating is just fine.

    • @CombobreakerDe
      @CombobreakerDe Před měsícem

      Normally you batch the "despam" process. You won't do it whenever someone clicks on the button. It's a separate process. You identify bots, check which posts they liked and update the counts accordingly.

  • @deynohmuturi1069
    @deynohmuturi1069 Před měsícem

    I did this with my project, a twitter clone web app.You're making me feel like a genius.

  • @ViciuSRO
    @ViciuSRO Před 4 měsíci +14

    what did you use to draw the diagrams? it look so good!

  • @gpzim981
    @gpzim981 Před 3 měsíci

    This is not how it’s done tho.
    Likes in a post still has a reference of who liked and when. They still have that related table but what actually happens is that they have a projection (read only) of the post with the aggregated data and that is only updated time to time, not every time someone likes the post, the UI creates the illusion but on the backend the projection will be updated after some time .

  • @christopherparke9878
    @christopherparke9878 Před 3 měsíci +1

    I would prefer a "total likes" table with a foreign key to the post ID. Less bloat on the posts table and achieve the same thing. (ID, postId, Total)

  • @Llorx
    @Llorx Před 5 měsíci +84

    The answer is always "cache it".

    • @petrlaskevic1948
      @petrlaskevic1948 Před 5 měsíci +1

      yeah, edit the databace once in a minute, with the value which is the sum of likes and dislikes over that time period

    • @chawza8402
      @chawza8402 Před 5 měsíci +6

      Yeah, but to "cache it" you required to count the rows before store it to the cache

    • @Llorx
      @Llorx Před 5 měsíci

      @@chawza8402 no, each time you add a new row, increase the number by 1. If you delete, decrease the number (like the video explains). That's also a cache because is still a "copy of the real data", which is the like count.

    • @danielcook237
      @danielcook237 Před 5 měsíci

      not necessarily i think you could count the new likes and the new dislikes and then add the cached value to the value already stored in the database and then set the cached value to 0@@chawza8402

    • @Kasukke
      @Kasukke Před 5 měsíci

      @@chawza8402no just increment it. You don’t need the cache to be 100% accurate, just fast

  • @earl.grey_
    @earl.grey_ Před 4 měsíci +2

    The db really should handle this kind of caching itself

    • @jc-aguilar
      @jc-aguilar Před 3 měsíci

      I know that snowflake cache it, it’s really fast with count(*)

  • @hotwaff
    @hotwaff Před měsícem +9

    You don't even necessarily need to put the counter in the *posts* table, either. 1,000,000 likes would mean 1,000,000 UPDATE calls on a table that is being read very often. Having another stats table, keeping the value hot in a cache and then reading from the cache if it's present will be vroom vroom.

  • @HoneyCombAI
    @HoneyCombAI Před 4 měsíci

    Great explanation, so solid.

  • @luca0898
    @luca0898 Před 27 dny

    In addition, you can place a trigger in the db so that every update operation the counter increment

  • @69k_gold
    @69k_gold Před měsícem

    I like how perfect normalization led to this, and could've been easily fixed with a NoSQL database like Mongo that supports aggregates

  • @chief-long-john
    @chief-long-john Před 2 měsíci

    What about maintaining a queue with the requests and update a value in the database with an async job that runs every 10-20 items OR every 5 seconds, which either one comes first. So you can do a batch update, keep one copy of the count, and keep normalization . And periodically you could correct this count value by doing another async check to the actual COUNT value of the database

  • @steventhawe1598
    @steventhawe1598 Před 2 měsíci

    In powerapps we call the column a rollup column a d its role is to calculate items in the child normalized table

  • @PaulSebastianM
    @PaulSebastianM Před 4 měsíci +1

    That's is not really denormalizing the normal forms as this is not removing a foreign key. It's simply adding a counter that is basically an automatically computed and cached value.

  • @whoisarjen6368
    @whoisarjen6368 Před 5 měsíci

    Videos like this it what we need!

  • @gokukakarot6323
    @gokukakarot6323 Před 3 měsíci

    They could also parse the output of Explain select count , to get an approximate. Above a certain number, 1million is the same as 1000002

  • @edocms
    @edocms Před 2 měsíci +5

    The real solution is to normalize it further by adding a new table to keep the count. Never de-normalize !!!

  • @nimamc
    @nimamc Před 4 měsíci

    This is standard practice in Firestore for example.

  • @doxologist
    @doxologist Před 5 měsíci +5

    Damn, what ER software is that?

    • @guiolly
      @guiolly Před 3 měsíci

      I wanna know too

  • @aGj2fiebP3ekso7wQpnd1Lhd
    @aGj2fiebP3ekso7wQpnd1Lhd Před měsícem

    Tell me the person who designed the IG architecture had no idea what they were doing without telling me

  • @user-md6hb3jc4j
    @user-md6hb3jc4j Před 21 dnem

    We applied the same solution in a similar case before real load testing, but we have more then one property to calculate. And this place became the performance bottleneck of the app. it turned out that we don't need it. just counting from db (read nodes) didn't affect performance of the app (in context of our requirements) but we implemented `snapshots` solution when you take the last `snapshot` and just add all `likes` for this post after datetime when this snapshot was created

  • @jtms1200
    @jtms1200 Před měsícem

    counter caches are definitely a good thing

  • @MrTStat
    @MrTStat Před 4 měsíci

    I did that in the balance field in my app
    Yes I have all the transactions for integrity but I update the balance for faster processing

  • @Andris_Briedis
    @Andris_Briedis Před 5 měsíci +7

    100% normalization is just theoretical fun. In the DB of normal production, some intermediate results should always be stored, so as not to overload the DB unnecessarily.

  • @mgs_4k198
    @mgs_4k198 Před 5 měsíci

    Damn son,
    Your vidoes Hella practict

  • @frontendtony
    @frontendtony Před 4 měsíci +13

    I've just gotta say, that's a really high quality camera you got there

  • @user-hy6cp6xp9f
    @user-hy6cp6xp9f Před 5 měsíci

    This is amazing shorts content. Nice, sweet, technical, interesting story. Learn a small fact, accessible

  • @shogrran
    @shogrran Před 3 měsíci

    Hence they started calling it "breaking the internet"

  • @thombrown
    @thombrown Před 3 měsíci

    You would have thought they had caching in the application layer. Strange to push that into the database.

  • @geteducatedyoufool4563
    @geteducatedyoufool4563 Před 5 měsíci

    I really like your explanations and examples
    Subbed

  • @theghost9362
    @theghost9362 Před 6 hodinami

    Effective, but not simple as it appears

  • @thebeastsclips
    @thebeastsclips Před 4 měsíci

    I remember when I deleted my Facebook comments, I still see the comment count even though I delete all of them.

    • @SMILEPANT
      @SMILEPANT Před 4 měsíci

      that is because of the soft deletion of comments from db.

  • @alexties6933
    @alexties6933 Před 2 měsíci

    Most sites dont even do that, youtube updates views periodically

  • @nielsvanderveer
    @nielsvanderveer Před 2 měsíci

    Only thing to keep in mind tough is using transactions. But this is supported by virtually all database systems, so should not be a big deal.

  • @erincarson8998
    @erincarson8998 Před měsícem

    The correct answer was incremented counter.

  • @CombobreakerDe
    @CombobreakerDe Před měsícem +1

    Man, holy moly. This is really how they did it? This looks just like a newbie mistake of someone who didn't think about it twice 😅

    • @ModiPooja
      @ModiPooja Před měsícem

      I was thinking the same lol

  • @kamranbashir4842
    @kamranbashir4842 Před 2 měsíci

    Denormalised database has higher risk of invalid or conflicting data. A process to increment count could be interrupted or terminated prematurely or an edge case can create a pattern of conflicting data. At a high scale, one might not observe a difference between 29877765 and 29877785 because both would be shown ad 29M but on lower scale it's a difference between 325 and 290. At lower scale developers are afraid of this because the mistake will easily be watched so they choose the safer route of keeping the database normalised and denormalise later if needed.
    This "later" things pile up and take a form of big technical dept which bites back when the app needs to be scaled up.

  • @aditya234567
    @aditya234567 Před 26 dny

    Real question here is why Justin Bieber has so many fans?
    Everyone is talented in their own way! Never understand celebrity worship

  • @czarlito_
    @czarlito_ Před 26 dny

    Right but what about concurrect access?
    Without the column there is no problem as each backend instance inserts the row within its own connection, so the inserts can happen out of order and concurrently.
    But if all of these connections want to have the value of the new column, they cannot get the same value, as they would all return n+1. So does the database queue these requests?
    How does it manage millions updates per second where each update has to have the newest version of the value?
    I think CZcams had similar problem with counting likes, that's why they did it in some clever way, and right after the upload the count would stop at around 300 views for some time.
    I don't know whether they still do it like this, or the times where the CZcams views got stuck at 300 are now long gone, but my heart is with you if you remember that.

  • @pranayvelisoju6144
    @pranayvelisoju6144 Před 5 měsíci +28

    This is wrong at many levels.
    1. Having just a likes counter will not work, we need to also save who liked a post so that we dont increment view/like counter upon repeated liking/unliking.
    2. Writing to the post table everytime some one likes is not a good approach when millions can like a post ina a few minutes.
    To address these we need to save both who liked a post and the reference to a post and to make sure this data takes less space, we use some probabilistic data structures like hyperloglog. These can estimate cardinalities of > 10^9 with accuracy of 2% and using around 1.5kb of memory.

    • @d0cx
      @d0cx Před 5 měsíci +1

      Not to mention any sort of race condition... I'm not familiar with how modern databases handle race conditions.

    • @oxelf
      @oxelf Před 5 měsíci +2

      @@d0cxMost of them with locks, they lock the table for writing while they do their transaction and then unlock it for other processes to write again.

    • @usernamesrbacknowthx
      @usernamesrbacknowthx Před 5 měsíci +8

      You have no idea what you’re talking about.
      1. “We need to also save who liked a post” - yeah that’s exactly what denormalization is doing. Preventing needing a COUNT query over the likes table every time a user views a post is exactly the problem that denormalization solves. The user only needs a ROUGH ESTIMATE of how many users liked the post and the only accurate information they need from the likes table is if THEY liked the post or not.
      2. This is just technobabble bullshit. You’re just doing denormalization with extra steps. Did you even watch the video?

    • @usernamesrbacknowthx
      @usernamesrbacknowthx Před 5 měsíci +2

      @@d0cx a race condition on what exactly? Where’s the race condition?

    • @usernamesrbacknowthx
      @usernamesrbacknowthx Před 5 měsíci +1

      ⁠@@oxelftable locking is extremely rare most are using row locking nowadays.

  • @mahadevovnl
    @mahadevovnl Před 29 dny

    They optimized it even more by not updating that record so often, they would cache it in Redis in batches and only have 1 update to the central database for every 1000 likes or so. That would be distributed across servers. So you might get 5 updates per minute saying: "A thousand more please."
    That "thousand" number was also based on how often the like button was pressed for that post. Lots of likes? It would update every 5k likes. Slowing down? It would stop caching.

  • @truehighs7845
    @truehighs7845 Před 2 měsíci

    Yep counting roses can be pricky....

  • @guptahimsocool
    @guptahimsocool Před 2 měsíci

    It would not be a long term solution, the solution would be up till certain number the query will run conventionally and it’s when the threshold has breached should the application writes it to the column. Which majorly covers and saves cpu cycles.

  • @mysticfakir2029
    @mysticfakir2029 Před měsícem

    Honestly this is basic design that should have been in there from the start. What happens when you don’t plan your design

  • @anashe5417
    @anashe5417 Před 5 měsíci +5

    which is the web page that you're using for the diagrams?

    • @hiteshjoshi2736
      @hiteshjoshi2736 Před 4 měsíci

      +1

    • @anashe5417
      @anashe5417 Před 4 měsíci +1

      @@hiteshjoshi2736 At the end I had to discover it by myself haha. It's called exalidraw and it's free

  • @ouazzaniyassine57
    @ouazzaniyassine57 Před 2 měsíci

    Like your explanation thank you

  • @abuzain859
    @abuzain859 Před 2 měsíci

    I am using this technique for Non-Relational DB.

  • @BrickPB
    @BrickPB Před měsícem +1

    Why didn’t they do this to start with?

    • @Raj-dy2cn
      @Raj-dy2cn Před měsícem +1

      Okay I guess the reason would be that , if you add an extra column for Like then one needs to first find the row and then update it , while as a separate table u could just add a new row , no need to find or anything , saving on some overhead, but as shown in the video the shortcomings outweigh the benefits .

  • @MGraczyk
    @MGraczyk Před 2 měsíci

    Instagram no longer uses a relational database for these sorts of things, i believe this specific case would have changed around 2017 but for sure by 2019

  • @savagesarethebest7251
    @savagesarethebest7251 Před 2 měsíci

    Seriously they were worse at database design than I was when I was 10 years old in year 2000..!?

  • @zoltanzorgo
    @zoltanzorgo Před 5 měsíci +1

    Normalization is great in theory. And as a diagram on paper. But you shouldn't come to a 3NF or BCNF in the first place if you intend to use a real-world RDBMS in a real-world application... like.. ever.

  • @mdahsanraza
    @mdahsanraza Před 2 měsíci

    Why do you ever need a separate likes table?!😅

  • @PiotrKundu
    @PiotrKundu Před 2 měsíci

    This is 20 years old knowledge, when data warehousing starting to be a thing

  • @JuanRamos-lo9xs
    @JuanRamos-lo9xs Před 19 dny

    I believe that it is important to metion the database. That idea, for example, is not a good one with PostgreSQL and his locks.

  • @0xsabbir
    @0xsabbir Před 3 měsíci

    sometimes a little extra processing is a good thing

  • @packtrouble6270
    @packtrouble6270 Před měsícem

    I'm assuming that you still going to log that a user liked the post because otherwise there is nothing stopping from someone hitting that endpoint over and over again if there is not validation.

  • @Dr_Larken
    @Dr_Larken Před 5 měsíci +1

    Oh ok! I figured since we’re solving a problem, when the problem was essentially every time a celebrity uploaded causing the infrastructure to crash. My problem-solving skills kicked in. Immediately arriving at, we’ll If we remove celebrities from the equation problem solved!
    You could apply the solution to solve our problem here in the video, and in the real world! Bada bing bada BOOM, Bob’s your uncle!
    Isn’t coding fun!

  • @UpToTheWhalesNow
    @UpToTheWhalesNow Před měsícem

    1) Counting rows in a realtional database is one of the fastest possible ways to count something.
    2) You're missing the half of this problem where millions of people are requesting the count (as well as increasing it themselves)
    3) A better term for fixing this problem is "materialization", not "denormalization"

  • @TheShawnMower
    @TheShawnMower Před 5 měsíci +5

    Relational databases 101. No shit. This is a "no shit sherlock" insight

  • @Markus-iq4sm
    @Markus-iq4sm Před měsícem

    We were taught that in University years ago, it is just logical

  • @mrleblanc
    @mrleblanc Před 5 měsíci

    What software is used to show the database table ?

  • @lenguyen20-14
    @lenguyen20-14 Před 4 měsíci

    This is so good content

  • @Nauxe
    @Nauxe Před 5 měsíci

    nowadays they use bloom filters.

  • @matt55346
    @matt55346 Před 2 měsíci

    Yesterday i was talking with a colleague that I don't really like SQL, today my feed is flooded with SQL tutorials.

  • @AndrewARitz
    @AndrewARitz Před 2 měsíci

    Their biggest mistake was using a relational database for this kind of application.

  • @zawadlttv
    @zawadlttv Před 4 měsíci +1

    or you make it like youtube and only sometimes update the likescounter

  • @andk9999
    @andk9999 Před 3 měsíci

    Most likely not the count (cachable) but the federation out (newsfeed) was the issue

  • @betoharres
    @betoharres Před měsícem

    Good content!

  • @lawalrasheed_
    @lawalrasheed_ Před 3 měsíci

    I’ve found that generic statements about sql optimization/performance are not very useful in practice. There are numerous things that affect the performance of a relational db, not least of which is the specific DBMS e.g MySQL with InnoDB optimizes count(*) to use an index scan which is quite performant.

  • @Zeegoner
    @Zeegoner Před 2 měsíci

    “Extra processing when someone likes a post”??? Total nonsense 😂😂😂

  • @mattcargile
    @mattcargile Před 3 měsíci

    Unless you use column store indexes…

  • @rbettsx
    @rbettsx Před 5 měsíci

    The exact number of likes is not business-critical, for large numbers. A new, time-stamped entity: ' estimated number of likes' does *not* denormalize the database. All it takes is a semantic confession that the cache is not necessarily up to date.

  • @Yayaisbadatchess
    @Yayaisbadatchess Před 15 dny

    I thought he said “counting bros”

  • @SMAntony007
    @SMAntony007 Před 4 měsíci

    Interesting, keep up the content

  • @user-gm1ki4od3t
    @user-gm1ki4od3t Před 24 dny

    this what i do...because i also encounter the same issue 😂

  • @abhijeetvaidya1638
    @abhijeetvaidya1638 Před měsícem

    I thought in row type, db updates are expensive than insert.

  • @luiscarbonel1991
    @luiscarbonel1991 Před 4 měsíci

    The better option for this is use optimistic update in frontend side and send events using sns, topics, sqs.... whatever.. and if you wants betters results process that's events in batch to celebrities users like JB..

  • @MrLinuxFreak
    @MrLinuxFreak Před 4 měsíci

    store in a virtual bucket and peek when reading is needed. couchbase for example

  • @michiel97531
    @michiel97531 Před 2 měsíci

    lmao justin bieber could have held instagram hostage by just uploading photos

  • @gamer_theegyad339
    @gamer_theegyad339 Před 21 dnem

    Thing is they should have known how many users on Instagram then expand the cache or storage..... 😊buh they've done a great job 👏

  • @neruajaudah
    @neruajaudah Před 3 dny

    what app that used on this video pls?

  • @hlubradio2318
    @hlubradio2318 Před 2 měsíci

    Too many things to worry about sql

  • @oakvillian5
    @oakvillian5 Před měsícem

    This is a simplification. Nowadays we’re using big data processing like map reduce with probabilistic data structures for the real time counts

    • @jorgeconsulting
      @jorgeconsulting Před měsícem

      Could you elaborate on “probabilistic data structures” and what’s out there to achieve that?

  • @dkchen
    @dkchen Před 2 měsíci

    So there’s a chance likes can get screwed up. You still have a likes table and maybe once a day you can reconcile the total likes to the likes table.