Database Sharding and Partitioning

Sdílet
Vložit
  • čas přidán 18. 05. 2024
  • System Design for SDE-2 and above: arpitbhayani.me/masterclass
    System Design for Beginners: arpitbhayani.me/sys-design
    Redis Internals: arpitbhayani.me/redis
    Build Your Own Redis / DNS / BitTorrent / SQLite - with CodeCrafters.
    Sign up and get 40% off - app.codecrafters.io/join?via=...
    In the video, I discussed the importance of sharding and partitioning in scaling systems. Sharding distributes data across multiple machines for improved throughput and availability. We explored how databases evolve through stages, the differences between sharding and partitioning, and when to introduce these concepts. I also highlighted the benefits of a collaborative system design course I offer. Scaling databases vertically involves increasing resources, while horizontal scaling adds more servers for higher throughput. Sharding splits data across shards, while partitioning divides data within a shard. Strategic partitioning is crucial for efficient data management.
    Recommended videos and playlists
    If you liked this video, you will find the following videos and playlists helpful
    System Design: • PostgreSQL connection ...
    Designing Microservices: • Advantages of adopting...
    Database Engineering: • How nested loop, hash,...
    Concurrency In-depth: • How to write efficient...
    Research paper dissections: • The Google File System...
    Outage Dissections: • Dissecting GitHub Outa...
    Hash Table Internals: • Internal Structure of ...
    Bittorrent Internals: • Introduction to BitTor...
    Things you will find amusing
    Knowledge Base: arpitbhayani.me/knowledge-base
    Bookshelf: arpitbhayani.me/bookshelf
    Papershelf: arpitbhayani.me/papershelf
    Other socials
    I keep writing and sharing my practical experience and learnings every day, so if you resonate then follow along. I keep it no fluff.
    LinkedIn: / arpitbhayani
    Twitter: / arpit_bhayani
    Weekly Newsletter: arpit.substack.com
    Thank you for watching and supporting! it means a ton.
    I am on a mission to bring out the best engineering stories from around the world and make you all fall in
    love with engineering. If you resonate with this then follow along, I always keep it no-fluff.
  • Věda a technologie

Komentáře • 128

  • @shishirchaurasiya7374
    @shishirchaurasiya7374 Před 11 měsíci +10

    I was literally consfused in gaining the clarity untill you came to the point where you transposed this theory into understanding through tables and the reference with SQL queries, thanks a lot to your efforts for this loving beautiful explaination Arpit sir

  • @ranjithpals
    @ranjithpals Před 2 lety +4

    Thanks a lot ! That was well explained with clear and concise explanation. Looking forward to enrolling in your complete system design course.

  • @AlokMehta24
    @AlokMehta24 Před 9 měsíci +1

    Excellent video Arpit . Coming from no software and system engineering background , this was the best video to explain data sharding and partioning . I am a Tech PM for AWS Supply Chain and data partitioning and sharding is real deal for us. Thank for making this extremely easy to understand video

  • @nuclearniraj
    @nuclearniraj Před 9 měsíci

    One video and all the clutter on Sharding and Partitioning is clear. Thank you so much Arpit.

  • @jaskiratwalia
    @jaskiratwalia Před 3 měsíci +1

    Wonderfully explained! Cleared all my doubts. Please keep making such videos. These are also well timed, not too short nor too long.

  • @aditijalaj5036
    @aditijalaj5036 Před 9 měsíci

    this is an amazing video and your explainations are very clear

  • @chaitanyawaikar382
    @chaitanyawaikar382 Před rokem +4

    One of the best videos explaining the nuances between partitioning and sharding. Thank you @ArpitBhayani

  • @___vandanagupta___
    @___vandanagupta___ Před rokem +1

    The knowledge of amount in this video is tremendous!!! Extremely helpful 👍👍👍 thankyou sir!!

  • @nimitkanani1691
    @nimitkanani1691 Před rokem

    Very beautifully and simply explained. The content of the video flowed so smoothly. Thank You @ArpitBhayani

  • @kritibindra4232
    @kritibindra4232 Před rokem +1

    Wow this was really really helpful! Thank you posting this.✨

  • @timamet
    @timamet Před rokem +1

    amazing explanations, thank you

  • @neerajdixit7102
    @neerajdixit7102 Před rokem

    Awesome Arpit, Thanks truly admire your way of teaching

  • @jithinb7047
    @jithinb7047 Před 10 měsíci

    Awesome content Arpit ! Thanks a lot and please do continue post more on concepts such as well as analysis of real use cases.

  • @AqibJavaid-zl7vc
    @AqibJavaid-zl7vc Před měsícem

    Excellent video ❤. Finally, I got a good grasp of the whole concept.

  • @Jamsessions0
    @Jamsessions0 Před 23 dny

    One of the best explanations on the internet, well done sir

  • @mohitkumartoshniwal
    @mohitkumartoshniwal Před rokem +1

    A very clear and detailed explanation. ♥️

  • @varshard0
    @varshard0 Před 4 měsíci

    thank you. I always assumed that they are the same thing. This cleared things up for me.

  • @sameer1571
    @sameer1571 Před 5 měsíci

    Bro your diagram example made my day. Such a clear and concise explanation of this topic. Bro dil se love u ❤❤ for making this video.

  • @vamsidharvemuluri3817
    @vamsidharvemuluri3817 Před 2 měsíci

    Best explanation so far. thanks brother

  • @zeyuli53
    @zeyuli53 Před rokem +1

    well explained, thank you

  • @nikhilrajput8696
    @nikhilrajput8696 Před měsícem

    Wow...really nice. Nowadays a lot of people are selling and talking about system design and always try to build some optimistic solution straight forward without going into the internals and in fact they have not even worked on a lot of systems. I strongly feel the way of your explanation is very very nice and I am going to buy your system design plan to improve mine.

    • @AsliEngineering
      @AsliEngineering  Před měsícem

      Thanks. Looking forward to having you enrolled 🙌

  • @aneksingh4496
    @aneksingh4496 Před 8 měsíci

    super video Arpit

  • @anandahs6078
    @anandahs6078 Před 2 měsíci

    Very good explanation with right examples. Hats off to you. Thanks for great content. I always thought shard and partitions are same but you clarified it very well.

  • @PoojaDurgi
    @PoojaDurgi Před 7 měsíci

    Amazing !!

  • @hanzalasiddique6313
    @hanzalasiddique6313 Před rokem +1

    Mind Blowing ❤

  • @vijaymunavalli335
    @vijaymunavalli335 Před rokem +1

    Its very practical explanation...cool one

  • @KishoreThatavarthi
    @KishoreThatavarthi Před 4 měsíci

    thanks a lot arpit sir really enjoyed and got full clarity

  • @jasper5016
    @jasper5016 Před 3 měsíci

    Thanks so much Arpit!!

  • @kalinduabeysinghe8917
    @kalinduabeysinghe8917 Před 10 měsíci

    Such a clean explanation🙌

  • @letsexplorewithanika2642

    Very clear explaination

  • @iMakeYoutubeConfused
    @iMakeYoutubeConfused Před 3 měsíci

    Very clear explanation, thanks!

  • @Sharmasurajlive
    @Sharmasurajlive Před rokem

    Simple and efficient explanation 👍🏻

  • @shintojoseph9166
    @shintojoseph9166 Před rokem +1

    Clear explanation

  • @lazry1773
    @lazry1773 Před rokem

    Dude this was amazing

  • @DEEPAKKUMAR-wk5pk
    @DEEPAKKUMAR-wk5pk Před rokem +1

    Wow great explanation

  • @heykalyan
    @heykalyan Před rokem

    Kudos to you❤

  • @prashantkamble898
    @prashantkamble898 Před 10 měsíci

    Greatly explained

  • @akshayrahangdale8511
    @akshayrahangdale8511 Před 6 měsíci

    Very Nice Video, I just loved the explanation.

  • @KriszSch
    @KriszSch Před 2 měsíci

    Great explanation!

  • @ryan-bo2xi
    @ryan-bo2xi Před 11 měsíci

    bohot badhia bhai .. lajawwab

  • @kaal_bhairav_23
    @kaal_bhairav_23 Před 2 měsíci

    thanks a lot arpit for an awesome explanation as always

  • @codecspy3479
    @codecspy3479 Před 6 měsíci +1

    2 Important points which i felt could be discussed more are 1) When you said the choice of partitioning depends on the load , usecase and access patterns , can you please give an example of each case ?? 2) When you were talking about the advantages and disadvantages of sharding , have you written these points considering only sharding and no partitioning or have you written considering both sharding and partitioning ??

  • @pramodpatil-ue8sm
    @pramodpatil-ue8sm Před 7 měsíci

    Great explanation, as always. Please post a link If you have recorded any video on Partitioning strategies

  • @ranjithpals
    @ranjithpals Před 2 lety +2

    Thanks!

  • @pixiedustdreams
    @pixiedustdreams Před měsícem

    I think I'm in love with this guy. 😢

  • @anshujaiswal5622
    @anshujaiswal5622 Před 23 dny

    Simple and to the point explanation .. Thanks Arpit, Liked & Subscribed :)

  • @TechSpot56
    @TechSpot56 Před 2 měsíci

    Nice explaination, arpit.

  • @vikasbhutra9400
    @vikasbhutra9400 Před 2 lety +1

    Thanks a lot Arpit for explaining in so simplistic way. One request can you please make video on Sharding strategies and also on how composite indexes stores in the disk.

    • @AsliEngineering
      @AsliEngineering  Před 2 lety

      Soon.

    • @hc90919
      @hc90919 Před rokem

      @asli engineering - Bhai, any update on the sharding strategies.
      Also, one more request is examples of scenarios to explain shard key selection.
      How is the data replicated behind the scenes n stuff please ?

  • @dhaanaanjaay
    @dhaanaanjaay Před rokem

    One question, at 21.00 the matrix shows what it looks like when we have both sharding and partioning, how that is different from having two databases on two different EC2 instance for two applications?

  • @user-dq8sg4ik5k
    @user-dq8sg4ik5k Před 10 měsíci

    literally one of the based video i have ever seen on this topic.

  • @shreyanshsinha37
    @shreyanshsinha37 Před rokem +1

    When we say Shard1 or Shard2, do we mean the sql server hosted on the EC2 instance combinedly as a shard?

  • @shrad6611
    @shrad6611 Před 6 měsíci

    finally I understand what sharding is, thanks a ton

  • @jivanmainali1742
    @jivanmainali1742 Před 2 lety

    Arpit sir I need your help clearifying few doubts
    In ecommerce platform like shopify each mechant is given their own collection for order cart account differentiated by some merchant identifier (projectId-order ) vs Same order table index by merchant ideidentifier ie projectId.So we can't apply sharding in first case.
    Also is it wise idea to deploy each merchant application separately as we would have to maintain each merchant app separately.So what do you suggest in those case?

  • @pranjalchoudhury1670
    @pranjalchoudhury1670 Před 3 měsíci

    Nicely expalined. :)

  • @sarthaknarayan2159
    @sarthaknarayan2159 Před rokem

    Awesome!!!!

  • @amananurag07
    @amananurag07 Před 26 dny

    @arpit Thanks for such dense information in so short and simple video.
    However I have a query on a corner case
    - How can have replicas when one has multiple shards with partitioning?
    - In this case is replication locally on the shard or it can also be replicated on other shards for high availability across avalability zone or DR (like kafka architecture)?

  • @ankitmaheshwari2341
    @ankitmaheshwari2341 Před 11 měsíci

    Do we use sharding when we have better options available like Oracle RAC where database can be scaled horizontally

  • @rahulpanjwani1887
    @rahulpanjwani1887 Před rokem +1

    Beautiful

    • @rahulpanjwani1887
      @rahulpanjwani1887 Před rokem +1

      It makes you understand the value of a unified data platform team when scale increases.

  • @aditiagarwal7081
    @aditiagarwal7081 Před 17 dny

    When running two databases on the same machine, are we not still sharing the same underlying resources such as CPU, memory, and disk I/O?

  • @tawseefbhat977
    @tawseefbhat977 Před rokem

    how do we know which partition or shard our data is located when we make query? any detailed explantion

  • @geekmuralin
    @geekmuralin Před 8 měsíci

    Wow

  • @sumeetsingh1729
    @sumeetsingh1729 Před 3 měsíci

    how's it decided which shard is hit by request? Is there any router in front ensuring routing of requests?

  • @hemsagarpatel8992
    @hemsagarpatel8992 Před rokem

    If we had horizontal partitioning and 1 partition getting so much traffic in real time how can we load balance the traffic. is it possible

  • @ohmygosh6176
    @ohmygosh6176 Před rokem

    Cross sharding quiries very very expensive. Its best to use tools to find out how the database is being used before making these decisions. I use PG Analizer tool for PostgreSQL

  • @aditigupta6870
    @aditigupta6870 Před 3 měsíci

    Hello arpit, at 5:49, why you mentioned that the new resources are being allocated to the EC2 machine? I think that should be allocated to the DB server running on EC2 machine right?

    • @AsliEngineering
      @AsliEngineering  Před 3 měsíci

      I meant the server running the database. The database is eventually running on some VM.

    • @aditigupta6870
      @aditigupta6870 Před 3 měsíci

      @@AsliEngineering thanks arpit

  • @sachinjindal4921
    @sachinjindal4921 Před 2 lety +1

    Awesome, can you give some practical examples.

    • @AsliEngineering
      @AsliEngineering  Před 2 lety +1

      These are practical as they can get keeping it generic and not touching upon SRE side of things :) Every database comes it its own partitioning and sharding strategy and we need to go through their documentation to apply it.
      I talked about using a database proxy to bifurcate the request in one of the earlier videos, in case you are looking for that.
      Would recommend you picking a database and seeing how you can actually create shards and manage them. ElasticSearch can be a great start.

  • @kritibindra4232
    @kritibindra4232 Před rokem

    Also which software did you use in this video to create pictures and write content?

  • @imperfecto7734
    @imperfecto7734 Před 9 měsíci +1

    @arpit what's the benefit of partitioning the data but not sharding it. Can you give me a usecase please?

    • @AsliEngineering
      @AsliEngineering  Před 9 měsíci +2

      Partitioning allows your database to read/access/move the required subset of data easily and efficiently.
      1. Imagine if you partition data by time and create one partition for every hour and someone queries how many events happened in the last 10 hours, you would just need to access last 10 partition to fulfil this query. Others are not even required to be read.
      2. In a distributed setup, instead of moving individual rows/elements we can easily and efficiently move partitions across the cluster for balancing the load.

    • @imperfecto7734
      @imperfecto7734 Před 9 měsíci

      Understood! Thanks 🙏

  • @sachthecool
    @sachthecool Před rokem

    Hi Arpit... You have nice videos. I like interviewes with people involved in growing high scale systems.
    However in this video, concept explained is wrong. Partition & Shards are same (term is used interchangeably). What you are referring as Shard is Nodes (or host container). You may want to correct the same. Hope this helps.

    • @AsliEngineering
      @AsliEngineering  Před rokem

      I agree the terms are used interchangeably; but overall what i explained is correct also I cleared the same in the video as well.

  • @aditigupta6870
    @aditigupta6870 Před 3 měsíci

    One shard also must be having replicas right? I mean if a shard is handling the first 2 partitions, then all data from those first 2 partitions will go to this shard, but what if the shard is down?

    • @AsliEngineering
      @AsliEngineering  Před 3 měsíci

      shared can have replicas to scale the reads. If the shard goes down, then either you auto promote replica to take over, or take the downtime.

  • @Bluesky-rn1mc
    @Bluesky-rn1mc Před 2 lety +1

    how foreign key constraints are managed when two tables are in different shards ?

    • @AsliEngineering
      @AsliEngineering  Před 2 lety +6

      Foreign keys are dropped when you adopt sharding. You cannot maintain FK when data is partitioned across multiple shards.

    • @Bluesky-rn1mc
      @Bluesky-rn1mc Před 2 lety

      @@AsliEngineering thanks

  • @gigachad400
    @gigachad400 Před rokem +1

    One of the biggest disadvantages of sharding over a SQL server is you lose the ACIDity so you have to be careful while you doing it with SQL databases

  • @abhigujjar7439
    @abhigujjar7439 Před 10 měsíci

    Can you please share the notes

  • @GaneshSrivatsavaGottipati
    @GaneshSrivatsavaGottipati Před měsícem

    what if we have read replicas and still have partitioning?

  • @dbads
    @dbads Před 2 lety +1

    💯

  • @arbazadam3407
    @arbazadam3407 Před rokem

    When you say we can have these partitions on the same server? That confuses me. On my linux server i installed MySQL which runs on port 3306. I have one MySQL process in this situation, so how can i spread the partition on this server.

  • @GaganJain2508
    @GaganJain2508 Před 10 měsíci

    Does it mean Sharding and replication are the same? 22:16

  • @iHariPatel
    @iHariPatel Před 7 měsíci

    As my view Partition is more complex because you have to work with partition key! With wrong query accidentally query scan all partition’s.

  • @mudassarh4268
    @mudassarh4268 Před 2 lety

    Sharding strategies could have been taken up like range based and hash based sharing with their user case

    • @AsliEngineering
      @AsliEngineering  Před 2 lety +1

      Sir. Video would have been too long. No one would have watched it. But definitely planning it for the next one.

    • @mudassarh4268
      @mudassarh4268 Před 2 lety

      Definitely sirji that could have added another 30 mins of content. Awesome content as always and looking forward to further stuff 👍

  • @aadimanchekar1032
    @aadimanchekar1032 Před rokem

    How do we know that in which partition does the data lie?

  • @jineshbagrecha6278
    @jineshbagrecha6278 Před rokem

    When to use master master, master candidate master replications?

    • @AsliEngineering
      @AsliEngineering  Před rokem

      master master - scaling writes beyond one machine
      master replica - scaling reads

  • @anupkut
    @anupkut Před 5 měsíci

    I think we should not consider only read replicas as sharding concept.

  • @user-nu5nn7by6t
    @user-nu5nn7by6t Před 3 dny

    How we know in which shard our data resides?

    • @AsliEngineering
      @AsliEngineering  Před 3 dny

      That depends on your routing strategy - Range/Hash/Static. In any case, you pick a partitioning key and depending on the approach you deduce which shard to go to.

  • @ManojYadav-ls6wo
    @ManojYadav-ls6wo Před měsícem

    12:10
    20:12 👍👍

  • @pranavnadimpalli4929
    @pranavnadimpalli4929 Před rokem

    22:34 cross share queries are expensive

  • @abhishekdhillon7110
    @abhishekdhillon7110 Před 6 měsíci +1

    dude, the way you have explained higher availability as an advantage of sharding is not right. When you have a sharded DB and various shards live on different servers, if one of the shards go down, availability is not an advantage since you can't perform any operations on that specific shard which is not available. For example, if you have two shards named A and B, if shard is down or not available, you can't read anything from that shard so all of the queries that are expected to read from shard A would fail unless you have a read replica of that shard. I feel that there is a better way to explain it. However, thanks for all your efforts and your content is helpful to a large extent.

    • @AsliEngineering
      @AsliEngineering  Před 6 měsíci

      Yes we cannot perform operation on that shard but we can still serve requests that can be served from the other shards. Hence the system still remains partially available.

  • @akshatreddy9870
    @akshatreddy9870 Před 3 měsíci

    Hi

  • @kumarshubham4640
    @kumarshubham4640 Před 2 měsíci

    Why course price exceeded by 20k in 1 year?

    • @AsliEngineering
      @AsliEngineering  Před 2 měsíci +1

      In 2 years, not one.
      The course has changed completely and I go much more in-depth and the sessions go for 4 hours each. Earlier it used to be 2.5

  • @arun10071990
    @arun10071990 Před 5 měsíci

    I think sharding has specific use cases not every solution requires sharding. The way he arrives at sharding solution is totally absurd.
    If one really wants to scale the writes he can also upscale the master db servers. Why to shard then ?

    • @AsliEngineering
      @AsliEngineering  Před 5 měsíci

      When did I not consider vertical scaling?

    • @arun10071990
      @arun10071990 Před 5 měsíci +1

      @@AsliEngineering it's not about vertical scaling it's about we can scale database with horizontal scaling and that too without using sharding
      Like multiple master servers for writes and multiple slave servers to handle reads

  • @sharoonaustin551
    @sharoonaustin551 Před rokem

    Small suggestion ad beech me mat daala karo bro, concentration toot jaata hai

    • @AsliEngineering
      @AsliEngineering  Před rokem +1

      CZcams daalta hai. I just enable them. It is upto their algorithm to decide where to place.

    • @AsliEngineering
      @AsliEngineering  Před rokem +2

      And I totally understand your frustration with ads but the world runs on them. Can't do much without it.

  • @luisdanielmesa
    @luisdanielmesa Před 8 měsíci

    We both worked for Amazon and you know nobody there would have taken this course... So you're either lying or... nah, you're lying.

    • @AsliEngineering
      @AsliEngineering  Před 8 měsíci

      15 SDE-2s, 3 SDE-3, 1 PE and 1 HoE took my course. If you do not to believe it is upto you.

    • @AsliEngineering
      @AsliEngineering  Před 8 měsíci

      Fun fact, after I replied to your comment I went on a 1:1 call and it was with an SDE-2 at Amazon working in CCF org :D

  • @amolnimbekar4817
    @amolnimbekar4817 Před hodinou

    All fluff. Surprised that no disussion about logical & physical shards, shard key , shared nothing architecture without that I don't think the sharding concept is complete.

  • @jose000
    @jose000 Před 2 lety

    Iio

  • @akshatreddy9870
    @akshatreddy9870 Před 3 měsíci

    Very bad. Hindu never shave off moustache and keep beard. Mussalman banne ka irada hain keya ? Please understand that you are Sanatani

  • @akshatreddy9870
    @akshatreddy9870 Před 3 měsíci

    Either shave both beard and moustache or keep both moustache and beard. Don't just shave moustache only and keep beard.

    • @iMakeYoutubeConfused
      @iMakeYoutubeConfused Před 3 měsíci

      He's put so much effor into the content of this video and this is all what you've got to say?

  • @eatajerkpal99
    @eatajerkpal99 Před 2 měsíci

    Hey arpit acan drop link for the notes that you presented in this video, thanks!

    • @eatajerkpal99
      @eatajerkpal99 Před 2 měsíci +1

      found them on your github, i wont spam anymore. thanks!!

  • @amogu_07
    @amogu_07 Před 2 měsíci

    thank you so much , clearly understood!!