How Reddit designed their metadata store to serve 100k req/sec at p99 of 17ms

Arpit Bhayani

zhlédnutí 28 198

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 27. 07. 2024
System Design for SDE-2 and above: arpitbhayani.me/masterclass
System Design for Beginners: arpitbhayani.me/sys-design
Redis Internals: arpitbhayani.me/redis
Build Your Own Redis / DNS / BitTorrent / SQLite - with CodeCrafters.
Sign up and get 40% off - app.codecrafters.io/join?via=...
Recommended videos and playlists
If you liked this video, you will find the following videos and playlists helpful
System Design: • PostgreSQL connection ...
Designing Microservices: • Advantages of adopting...
Database Engineering: • How nested loop, hash,...
Concurrency In-depth: • How to write efficient...
Research paper dissections: • The Google File System...
Outage Dissections: • Dissecting GitHub Outa...
Hash Table Internals: • Internal Structure of ...
Bittorrent Internals: • Introduction to BitTor...
Things you will find amusing
Knowledge Base: arpitbhayani.me/knowledge-base
Bookshelf: arpitbhayani.me/bookshelf
Papershelf: arpitbhayani.me/papershelf
Other socials
I keep writing and sharing my practical experience and learnings every day, so if you resonate then follow along. I keep it no fluff.
LinkedIn: / arpitbhayani
Twitter: / arpit_bhayani
Weekly Newsletter: arpit.substack.com
Thank you for watching and supporting! it means a ton.
I am on a mission to bring out the best engineering stories from around the world and make you all fall in
love with engineering. If you resonate with this then follow along, I always keep it no-fluff.
Věda a technologie

Komentáře • 61

@nextgodlevel4056 Před 2 měsíci ⁺¹⁷
Successfully ruined my upcoming weekend. Have to view all of your videos now 😢
@richi12345678910 Před 2 měsíci ⁺⁴
The Kafka CDC can solve the problem of synchronous write inconsistencies, but not the backfill overwrriting. I suspect they might do some kind of business logic or SHA/checksum validation to ensure that they are not overwriting the data during backfilling. Correct me if I'm missing something bro.
@ankitpandey3724 Před 2 měsíci ⁺²³
Weekend party ❌ Asli Engineering ✅
@vinayak_98 Před 2 měsíci ⁺¹
Hey Arpit, thanks a lot for putting this up. Your writing skills are next level, crisp and crystal clear. Could you please tell what's the setup you use for taking these notes?
Thanks in advance.
@AayushThokchom Před 2 měsíci ⁺²
Large Data Migration -> Event Driven Architecture
Also, interesting to learn about postgres's extentions which are not required if going with a serverless database solution like DynamoDB.
@prtk2329 Před 2 měsíci
Nice, very informative 👍
@raj_kundalia Před měsícem
Thank you for doing this!
@user-ob1zi3jc1r Před 2 měsíci
how many shards were used to hold those partitions to achieve that much throughput
@GSTGST-dw5rf Před měsícem
Отлично, что на CZcams есть такие полезные видео. Спасибо, Министр!
@JardaniJovonovich192 Před 2 měsíci
I guess we don't need both the CDC Setup and dual writes, just thr CDC setup would suffice to insert the data in the new DB, correct?
@anurag-vishwakarma Před 2 měsíci
Can u share the notes ? Pls
@sachinmalik5837 Před 2 měsíci ⁺³
Hi Arpit, I think you could have gone a bit more into depth, like they have mentioned in their blog. a bit about how they are using incrementing post_id, which allows them to manage most of the query from 1 partition only. Not complaining at all. Thanks for being awesome as always.
TLDR; 7 minutes seem a bit short
@AsliEngineering Před 2 měsíci ⁺²
I deliberately skipped it because it would have taken 4 more minutes of explanation. I experimented in this video by keeping it very surface level and around 8 min mark, and the retention for this one is through the roof 😅
In the last 15 videos I saw a massive drop in retention numbers when I started explaining the implementation nuances or when video length went beyond 8 minutes. So I wanted to experiment and test out the hypothesis in this one video.
Hence you see I did not even inject the ad of my courses or the intro. Jumped right in the topic.
But yes, given their IDs are monotonic, their batch gets for media metadata would almost always hit the single partition if they partition the data by ranges.
@SOMEWORKS456 Před 2 měsíci ⁺²
Would appreciate if you can make another 8 min video for details.
I am here for the meat. Surface level stuff is ok. But meat.
No complains, just stating the opinions of a random lurker.
@amoghyermalkar5502 Před 2 měsíci ⁺¹
@@AsliEngineering so in short your focus is more viewers instead of better video quality right?
@AsliEngineering Před 2 měsíci ⁺¹
@@amoghyermalkar5502 if you really feel like it then please check out other videos on my channel.
I go in depth that other people cannot even think about or comprehend.
Remember, it hurts to put in effort of 2 days on a video to be seen by just 2000 people in 7 days.
@sachinmalik5837 Před 2 měsíci ⁺¹
@@AsliEngineering Absolutely. I can understand that, without naming names there are so many "Tech" Creator who are getting 10x times the views we got here but they just never seem to talk about Substance.
I just want to you know we do appreciate it a lot. I am still trying to read more blogs on my own so I don't think I am being spoon fed for watching a video,
@mehulgupta8218 Před 2 měsíci ⁺²
As usual great stuff🙌🏻
@kunalyadav4776 Před měsícem ⁺¹
Since they are storing data in JSON and also scaling the postgres db, Why they did not go with non -relational db like mongoDB, which stores data in JSON and also provide scaling out of the box ?
@AsliEngineering Před měsícem
I have already answered this in some other comment, do go through that.
TLDR; operational expertise and familiarity seems to have taken precedence.
@guytonedhai Před 2 měsíci ⁺¹
Asli Engineering!
@poojanpatel2437 Před 2 měsíci ⁺¹
Good video. Would appreciate a lot it you can attach any resources you used in video like blog from reddit that is mentioned in description. Would be great if link is also attached there.
@AsliEngineering Před 2 měsíci ⁺¹
Already added in the description. You are just a scroll away from finding that out.
@dreamerchawla Před 2 měsíci
Hey Arpit… thanks for the video
I liked doing partition as policy that runs on a cron. But wouldn’t moving data around in partitions also warrant a change in backend(read) ?
Or you are saying the backend has been written in a way that it takes partitioning into account while reading the data?
@AsliEngineering Před 2 měsíci
They use range based partitioning so no repartitioning required.
@AsliEngineering Před 2 měsíci
They use range based partitioning so no repartitioning required. New range every 90 milion IDs.
@code-master Před 2 měsíci
How will you handle search, because the relevant data might be several days older partitions. Even if they're using a secondary data store, the date/time range-based partitioning or even sharding will not suffice. what do you think?
@AsliEngineering Před 2 měsíci ⁺¹
Why would it be several days older? The migration was a one time activity.
Post the switch the writes of media metadata is always going to the unified media metadata store.
@code-master Před 2 měsíci
Thank you for your response, Apologies, my question was not clear.
My question was related more related to searching through such a data store where the data is partitioned daily i.e. partitioned on the created_at.
Let’s say you search for an 'X term' post, and the result ideally will contain a lookup from several partitions. For example, if there is a relevant post from a year back. We are looking at many partitions. To build the search result, the DB has to load each daily partition.
Daily partition will work well if the lookup is limited to a couple of days back. That’s my understanding.
@keshavb2896 Před 2 měsíci ⁺¹
Why reddit don't go for document db for there storage as per structure and pattern .... What u think about it @arpit?
@AsliEngineering Před 2 měsíci ⁺²
according to me familiarity of stack could be the biggest reason.
Apart from this the query pattern here is that most request hits single partition (given IDs are monotonically increasing and partitions are created on range basis). Most KV stores do hash based partitioning because of which the lookups need to be fanned out across the shards which is quite expensive.
The databases that do support range based lookup on per shard is DDB and that managed offering at scale becomes very expensive.
These are some of the pointers I could think of. But again this is pure guess.
@pixiedust56 Před 2 měsíci ⁺¹
Arpit - using cdc and kafka.. that still does not solve the problem of - Data from old source during 'migration' overriding data in the new aurora postgres, right?
What am i missing?
You will still need a bulk batch job that takes up all the archival data from all the multiple sources and ingests them into the new Aurora. Using CDC does not solve for that backport, correct?
@2020goro Před 2 měsíci ⁺¹
CDC can transfer the historical data using snapshots of the existing databases if/when the transaction log is not available for old data, and then the consumers report any write conflicts into a separate table which the devs can remediate later on. Hope that answers your question
@AsliEngineering Před 2 měsíci ⁺¹
The consumers of Kafka have this responsibility. It is not that just adding Kafka solved the problem. The core conflict management is written in the consumer of it which checks and sets in the database.
@nextgodlevel4056 Před 2 měsíci ⁺²
How pg bounce minimizes the cost of creating a new process for each request?
May be I am wrong, can you tell me how cost is reducing here?.
@niteshlohchab9219 Před 2 měsíci ⁺²
let's say each connection spawns a new process. killing a connection kills the process.. what would you do logically? Think before reading the next line..
simply re-use the connections .. that's what every database proxy in front usually does in simple words.. the connections are re-used and managed accordingly.
@nextgodlevel4056 Před 2 měsíci
@@niteshlohchab9219 Thanks, bro, for the easy and simple explanation; I appreciate it. What I was thinking was that the term "cost" is used for money, but I was wrong. Here, "cost" means scalability and performance, ensuring that each client gets a response as quickly as possible. So, in terms of money, we increase the cost, and in terms of scalability and performance, we decrease the cost.
If we look at it for the long term in enterprise applications, having a scalable product also increases revenue.
Let me know If I am correct or not 🙃
@AsliEngineering Před 2 měsíci ⁺¹
Because it does connection pooling, so connections are reused.
@karanhotwani5179 Před měsícem
Nice. Kafka part seemed over engineering. Can just verify the hash before writing to new metadata db in syncing phase.
@namanjoshi5089 Před 2 měsíci
Amajeeng
@LeoLeo-nx5gi Před 2 měsíci
Thanks Arpit!! Also what are your thoughts about using Pandas as a metadata DB, Dropbox had a post regarding they using Pandas wherein they explained in depth why other DBs are not better for them. (Would like to know your views too on it)
@AsliEngineering Před 2 měsíci
I am not aware about this. Let me take a look.
@suhanijain5026 Před 2 měsíci ⁺¹
why are they using postgres, if they are storing it as json ?
@AsliEngineering Před 2 měsíci ⁺¹
Stack familiarity, plus range based partitioning support.
@calvindsouza1305 Před 2 měsíci ⁺¹
What is used over here to write down the notes?
@AsliEngineering Před 2 měsíci
GoodNotes.
@calvindsouza1305 Před 2 měsíci
Thanks looks very clean
@atanusikder4217 Před 2 měsíci
What is CDC mentioned here ? Please suggest some pointers
@gauravsrivastava3884 Před 2 měsíci
Change Data Capture
@myjourney7713 Před 2 měsíci
How did they check if the reads from old vs new database are same?
@AsliEngineering Před 2 měsíci
a simple diff would work given that the final JSON has to be the same as no changes were made to the client.
@myjourney7713 Před 2 měsíci
@@AsliEngineering if there is any issue at scale, wouldn't it be very hard to debug?
@ramannanda Před měsícem
postgres is the king still :) with extensions it is all you need...
@LeftBoot Před 2 měsíci
How can I use AI to make this sound like my native language?
@anand.garlapati Před 2 měsíci
Are you saying the reddit has unified database per region?
@AsliEngineering Před 2 měsíci
Unified database implies that the data that was split across multiple services has been moved to one database.
Now this unified one can be replicated across regions to improve client side response times.
@anand.garlapati Před 2 měsíci
What was the motivation to go for dedicated database per service initially by the Reddit? Could you please tell how many such services they have it?
Regarding your second point, the Reddit team allowed only reads from the replicated databases and not writes. Correct?
@shalinikandwal7845 Před 2 měsíci
what is CDC?
@AbhisarMohapatra Před 2 měsíci ⁺¹
Change Data Capture ... means streaming of bin log files of database
@bhumit070 Před 2 měsíci
Dayumm

Další v pořadí

Automatické přehrávání

Disabling Garbage Collection helped Instagram save money and get better performance