System Design Interview: TikTok architecture with
Vložit
- čas přidán 10. 06. 2024
- We attempt to design a large-scale distributed video hosting platform like TikTok or Instagram Reels.
The engineering involved in building these systems is complex, and our attempt does not (even nearly) cover all the challenges that these engineering teams face. We instead have a mock system design interview setup. Yogita will have 45 minutes to design an architecture that can scale, is performant, fault-tolerant, and meets the functional requirements.
00:00 Intro
00:34 Problem Statement
01:24 Requirement listing
04:00 Capacity Estimation
06:34 Design skeleton APIs
08:34 Choosing datastores
12:10 Comparing datastores
19:16 Ingestion Engine
24:21 Video pipeline
30:59 Last mile delivery
33:46 What is a CDN?
35:52 Network Protocol
38:03 End to end request flow
39:54 Caching
41:19 Evaluation and verdict
45:03 Final Architecture
Yogita's Channel (sudoCODE): / @sudocode
InterviewReady: interviewready.io/?_aff=SUDOCODE
Social Media:
Github: github.com/coding-parrot/
Instagram: / applepie404
LinkedIn: / gaurav-sen-56b6a941
Twitter: / gkcs_
#SystemDesign #InterviewReady #SoftwareEngineering
If you are preparing for a system design interview, try get.interviewready.io.
All the best 😁
S3 is not a file storage
Hi. Could you please share the name of the online tool you are using for colaborating?
@@vishal733 All online meeting service will have a whiteboard inbuilt in it such as webex, zoom, etc.
I have 2 question on the final architecture diagram. one is why raw video is sending directly from ingestion to s3. s3 only take final processed video after processing by workers right? and second, why the arrow is from different devices to CDN instead of CDN to different devices
What software is used for drawing in this video?
Thank you both for putting this together and providing this content openly. This is very helpful for those trying to prepare for this exact type of interview scenario and who might not be familiar with the format. Excellent job!
Scrolling tiktok for 45 min. - No
Watch whole video for 45 min. - Yes, it's great.
These kinds of mock discussion on SD is really helpful. Provides viewer a thought process while dealing such questions. Kindly do more these kinds of video ...
Why do u have two spaces around "viewer"
++
Very detailed, touches very important system design aspects. Gives many pointers for further research!
A zillion Thanks!
Another awesome delivery , thanks Gaurav ,
One thought :- we increased the storage to ~6x for considering different resolution and formats , which we can handle by introducing 2 entities in the system . one , for avoiding different format , we can provide a dedicated video player to user, which understand our format only . Second entity is a resolution manager which we can place before streaming engine , which can help us to upgrade or downgrade a resolution as per user bandwidth or user reqest .
take axample like netlix and youtube , they have their own media player which can understand their recording format . yes one extra task will be to convert uplaoded videos to application understanding format while uploading only but that will be fruitfull in saving 6x of storage cost .
resolution can also be handled at runtime in 2 ways .
-One by keeping always a high resolution copy and downgrade it at run time before serving to user. downside is a storage increment because of high resolution copies .
- another is to always keep a low resolution copy for reference with some pixel patteren files to convert the low resolution copy to high resolution copy at run time . Up side it we can reduce the cost of storage system significantly.
for perfromace handling in conversion , a dedicated system with predefined resolution converter filter can work .
Brilliant points, thanks!
It would also be good idea to take a look at ffmpeg and "ts" files creation
Yes it is common sense to create your own video player which supports all devices instead of creating 20 formats lol.
@@edwardspencer9397 It not just about creating an app which can play video. You'll of-course have an app. Different formats have different properties. Some have small file sizes but require some hardware acceleration to perform well which may not be available on all devices. So even if you create your own player, it will do software decoding which will be slow - users will complain about phones getting warm, high battery consumption and sluggish performance. Instead you create different formats that are optimized for a particular family of hardware. There can always be a basic format as a fallback but you should cover the large percentage of devices in formats optimized for them.
@@lhxperimental Large percentage of devices is no longer true. Businesses always prefer those who have medium / high end phones/devices capable of hardware acceleration because all the others owning low end phones are mostly poor people who have no intention to spend any money on subscriptions or visit advertisers. So even if a poor guy uninstalls something due to overheating issues it shouldn't be a problem.
Few ideas!
- Utilising the fact that most requests are of videos that are in trend, and trends die in ~month or so, instead of storing all the transcoded files, we have a live transcoder, and store the result in a cache (or CDN) with a TTL of ~ month (this time can be decided by data analysis). Twitter did this and were able to save millions on storage costs.
- We can have live websockets with the online users, so that whenever the video is complete we can notify them, and maybe also the users who were tagged, or are very engaged with an account.
- Instead of dividing videos in chunks after receiving the whole video, let the client do the chunking and upload chunks only. This would result in way less failures as if a upload fails after uploading 95% of the video, you don't need to re upload the entire file again.
- Maybe have caches on top of databases
s3 also have multiple tiers . you can set the rule to move files to lower tier after set time and further
Agree with chunking the video on the client side!
You both are just too good!! I love the authenticity and simplicity. The actual interview does take this similar course. Keep up the great work.
one of the most valuable content in youtube for young IT engineers
There should be more sessions like this. It's super helpful. I loved it!
I love this video and got to know atleast at a basic level the system design approach.
Awesome, guys! It is really valuable to see such interview in action. Feels like you are the one who is being interviewed. Good job, thank you! 🤩
Two of my fav youtubers on system desigm
This was probably the best video so far. Please try to make more such videos
This video is so good. It so helpful talking to engineering manager.
Liar it's no where near the real world projects...!! Although they are really good, it only gives us a idea of MVP and also how to crack interviews!! Real world scenarios are much worse and terrifying👻😱!!
Thanks so much Sen-sei
Kudos on this interview. So refreshing to see a mock sys design on youtube where the interviewer takes it seriously, challenges, questions and pushes the decisions of the interviewee.👏
Amazing video!!! Learnt a lot. The parallel workflow thing blew my mind. I thought it could be done later on, maybe post the original upload in a slower way. But that matrix thing was amazing!!
Excellent video ! Thanks Yogita for putting yourself out there for our benefit.
That was really amazing... like how smoothly she explains bits and pieces of the problem.
loved it.
Learned a lot.
.
.
Thanks a lot for this content guyz.
You're very welcome!
Hey gorav, much helpful for the freshers and people with 1-2 years of experience in this field because this is how we deal with upper management, I always gets those diagrams and based on that do my implementation but now only I knew how they come to the conclusion of what needs to be done. Thanks for this. 👍
I LOVE THIS VIDEO!!! You brought a pro and the back and forth brings that dual insight
Thanks a lot Gaurav for this extremely useful video. I must appreciate Yogita for this very detailed system design and component choices right from the queue, S3, CDN, Diff DB's, etc were awesome and especially the processing part of the video via workers. Thank you both!!
This is way to learn How system design with respect to requirements
By watching this video I fallen in love with System Design 😅
Coincidentally Akamai CDN was down just a few days after this video was uploaded
Great discussion. Yogita, huge respect. The way you explained the different choices you took, is an eye opener for people like me who is going to take the bull by horn soon. Subscribed to your channel as well. Thank you Gaurav.
Fantastic video, guys! Thanks so much for sharing! Very insightful!
It was too good! informative. Hoping to see more such videos. Thanks Gaurva and Yogita.
The best mock I saw in my 2 months studying for my interview.
amazing, thank you both for this
There should be some questions asked upfront before diving in such as "do we want video searching", "do we want to generate newfeed", "what about video sharing", "are users able to download video", "are users able to follow other people", etc. After that we can focus on what the interviewer is really interested at.
ya i was wondering the same
That would be really a microservices part AFAIK. Scalable architecture is the first goal followed by additive services.
@@ashishprasad1963 correct
Great video...
The way she used all of her info and Gaurav summarized, it is just great in a short time.
Thank you
I'm just 10 minutes in the video and it's already great! Thank you for this! :D
When i started watching i thought ill quit in between but the session was so nice and non boring and interactive that I watched the hole video thanks a lot for this
this video was not on hole, are you sure watched this video only ?
This is so practical and relevant. Thank you.
Awesome stuff ! Thanks for this, Gaurav !
This was very informative, thank you !
Excellent session very helpful..u guys r actual heroes for dev like us..
really enjoyed the session and also learned new things, keep uploading more
One of the best videos to understand system design. Thanks guys
this is so good, thank you Gaurav and Yogita!
this video is just so precious . many thanks
This video is amazing guys, great work
I am watching this video after almost 2 years. Thanks for uploading these kind of videos, They are very helpful.
Thank you!
Thank you so much, Gaurav and Yogita. I got to learn a lot from this particular video. Please posting such videos for the community. Thanks again.
i think the integrations of s3/cdn and cache/cdn are something i would like to learn more as a followup. Great video btw!
In so many video I searched the difference between sql and no sql but i didn't understand the use case but I got a clear picture about the use case for the no sql.. Thanks for this keep posting your videos especially yogitha
Great video as always Gaurav. Well done. Look forward to more such interviews. :)
super informative , sudoCode effort was really great. Keep making more such content, lets take airbnb as next system.
Thanks Yogita and Gaurav, looking forward to more such videos
Thanks @gaurav for making such a extremely handy and useful video. Kudos for that. 👍
Can we please have part 2 of this video where you include discuss about the
1. Exception handling and reporting,
2. Ballpark estimate for each component of this system.
3. What strategy to be used a month or a year after to decrease load on the file system.
I read that some people have already talked about this. As another solution per requirement, I feel you need not wait for all the formats and resolutions to be available one at a time. You can push them to a queue and then a worker group can keep on pushing. This will allow more parallelism. In this way the video with lower resolution/size can be made available for preview while the UI to the uploader can show that the rest are being processed. Or, otherwise the original video can be uploaded directly and the format and resolution part can be taken later. Many times we edit the videos. Once all formats are available the video can be made viewable to public.
Many thanks for sharing. It is helpful to see the chain of thoughts, when architecting the solution.
Wow it was really great and i was waiting for this kind of video from long time to understand how the system design discussions will be done be in details which you did, Thank so much for both of you and Request you to come with similar kind of videos for different complex use-cases like Banking or Insurance ...e.t.c.
Very helpful.
Have used all the knowledge gathered so far in the playlist.
Thanks for sharing this discussion!
You're welcome!
Very helpful discussion around databases. Thanks Yogita and Gaurav!
This is really informative. Good job folks. Looking for more sessions like these.
I learned a lot from this video. Thank you very much.
More of this please! ♥️
Great video! One feedback - I didn't see the usage of the 1.2TB data you calculated, I mean a translation of how many servers (with resources like CPU, RAM, Disk, IO, etc) would be needed for ingestion pipeline as well as storage would have been helpful. Also, some interesting scenarios like thundering herd, data compression to reduce cost would have been of great help. And don't you think, putting all the video in the CDN would be cost heavy. Should have some strategy based on popularity/recency/TTL and upload/remove the video from CDN.
Amazing ....u guys rock...thanks for sharing , waiting for more 🙂🙂
Very good for some one who is interested in designing solutions...hits the basics really hard.
One of the best video on this channel.
Hey Gaurav,
Love to see this amazing and informative video.
Please make more mock interviews video.
All the best and Happy Deepawali 💥
Great take at the design problem. :)
However I'd have a different approach for replication. We're replicating the video in s3 for 2 reasons:
1. Fault tolerance
2. Latency due to geographical location
I'd suggest to replicate to far fewer s3 locations and that too only for (1).
To tackle (2) we can have this approach -->
1. Buffer around 1 second or so of the video on the device upfront.
2. When user starts watching the video, then lazily load the rest of the video in chunks.
The buffering strategy further depends on (to name a few):
1. Device network quality
2. Prediction of potential videos which user might want to watch based on some ranking algorithm
Also, regarding hot video meta data caching:
1. We can cache the api response at cloudfront end.
2. Redis can also be used alternatively.
Redis might be a better approach here because it is distributed and if the video is deleted/modified by the OP then we can update it accordingly.
1. We can cache the api response at cloudfront end. -> AWS has the Global Accelerator for this purpose. It's costly, but if you're ingesting ~1.2TB of videos everyday, you can afford it.
Thanks Gaurav Sen & Yogita for informative contents. You guys are great. I was looking for such videos since long time. Finally found one. Thanks again.
Our pleasure!
Thank you Gaurav for the video, this kind of interacted videos will explore more and more queries to understand the sd
Awesome thanks Gaurav and Yogita 👍
Amazing video....lot of questions were addressed. This duo should do a video series covering other case studies like :
stock broker platform , uber , whatsapp etc
czcams.com/video/vvhC64hQZMk/video.html
amazing video...You should do videos like these more often....
Long time subscriber of Yogita's channel here!
She came really prepared for this question! Didn’t she 😂 she was playing back what she prepped really nicely for this video. Great stuff folks 👍
Maza aagaya... Thanks a lot... So much knowledge in a 45 min video.
This video is very informative , thanks to both of u .
Wow, this is so awesome!
This is my first system Design video that I watch till end 😅
Thanks a lot for this awesome content 🙏
Very well designed ... Loved it 👍
This was really nice discussion, AWS has got a good endorsement…. On a lighter note
Thanks for this!
Good one, @yogita explained very well.
Ultimate knowledge 🔥
Very Informative! Thanks for sharing
Fabulous video.. Thank you @Gaurav and @Yogitha
Inspired me to think about IT in a significant way for the first time
Thanks, good video that explains how the world's most popular app works
Very useful video! Thank you
This concept of video is awesome
wow the end-to-end request flow was really smart, as we're just returning the list of metadata it'll be fast and metadata will have actual video link too
Hi first of all thank you both of you so much for sharing how things work .i will.wish for your best future
Great video. Made me like and subscribe within 3 mins
Gaurav sir aap to clean bold ho gaye. Interviewer got impressed throughout. Thanks so much for the efforts.
Wow very very educative !! Big ups !!
Great job, thanks!
Great discussion...The most important parts starts at 19:20 and 38:04 to be specific
We want more of these mock interviews plz..
Super one, good work you both 👍
Instead of Uploading Files from Api ,
can use direct upload file into S3 using signed S3 url
The idea to split the video file to chunks and process them parallel is really interesting and I feel very fundamental in processing input in general.
How does that happen exactly by the way ? You literally split 1 mb file into three 333kb files and then convert them using any file-format-converter like FFMpeg etc, and then merge again ??