Distributed Locks | System design basics

Tech Dummies Narendra L

zhlédnutí 135 250

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 9. 09. 2024

Komentáře • 154

@20frieza Před 5 lety ⁺⁸²
For some reason, I didnt quite like the design of this locking system. First of all the moment you add a TTL, the "Integrity" part of the problem is compromised. If it is not for some reason, the video failed to explain it.
Secondly, adding time as a unique identifier is a bad idea for this because as you said you either need to have all the machines serving same time which is not easy OR have tolerance capability built it. Both of these solutions look complicated to me. If you are already building this complicated system of locking, I would rather externalize the generation of unique keys to a KeyGen Service or in simple terms, use the server/machine ID as a part of the unique id. That ways it will always be unique.
For me, rather than having a TTL, I would rather split the file in chunks and apply locking at chunk level. This way multiple machines can work on multiple chunks of the same file. This will still not solve the issue of parallel writes to 1.out. So to mitigate that, they can probably write to something like 1.chunk1.out, 1.chunk2.out and so on and then have an aggregator service which will aggregate the chunks and for a 1.out. Surely it adds complexity but I think this is a good way to mitigate the parallel writes.
Thanks for the videos!
@sandeshkobal Před 3 lety ⁺⁶
Instead of TTL, Uber does the heartbit from the app server to the lock manager instead of hard coded TTL.
@hlibpylypets1333 Před 2 lety
While I agree with your comment to some degree, you don't provide any real-world system example which operates the way you described. On the other hand, Chubby uses TTL - "While imperfect, the lock-delay protects unmodified servers and clients from everyday problems caused by message delays and restarts."
@chenzhang1729 Před 2 lety
Then how do you lock 1.chunk1.out ? You still have the same problem, no?
@narendrakumargupta7849 Před 5 lety ⁺⁵⁷
Thank you.. But it looks like by adding TTL we defeated the original purpose.. Original purpose was not two user(I1 and I2) should write in 1.out at same time. After adding TTL it looks possible I1 is writing to 1.out file and i2 may start writing as well which may corrupt the 1.out file
@seanhsu9381 Před 5 lety
Same question. Anyone can illustrate? @Tech Dummies - Narendra L
@jyotindersingh3458 Před 5 lety
@@seanhsu9381 It also puzzled me too, but in the end, there wouldn't be any data corruption. Either one of them will write the whole file
@seanhsu9381 Před 5 lety ⁺¹
@@jyotindersingh3458 Your idea is more like later request will overwrite the previous one, but not protecting race condition (tho some application is okay with this, such as wikipedia).
I found a good answer to deal with race condition in one of comments below from @Sunny Gupta
martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
@narenmehra9268 Před 5 lety ⁺¹
@Narendra Kumar Gupta, you are right, that will be a problem. To avoid that, TTL has to be set after monitoring the performance of the applications which are trying to acquire the lock, e.g. after 1000 iterations, you observed that the applications are taking maximum 7 seconds to process and then release the lock, so you set the TTL at 8 seconds.
@yarivh1 Před 4 lety ⁺¹
@@narenmehra9268 lets consider the following: each machine should increment the value, and each machine had different processing time for that (or even sleeping time - just to illustrate) - server 1 pulling value of 0, locking, ttl has pass, server 2 pulling 0 , and update to 1, release the lock, server 1 already has invalid data, and updates the db from 1 to 1.. not sure that this solution solves it..
@DeepakMishra117 Před 5 lety ⁺⁷⁸
I thought you were born with the cap on :D
@ravhaak Před 5 lety
hahaha ... Lovely. By the way, Naren is really awesome.
@sureshchaudhari4465 Před 4 lety
Sikha raha wo banda tumko tum uski tang khich rahe ho back bencher found 😁
@deepakmishra63 Před 4 lety
Guys, it was a joke. Itna to chalta hai engineers ke beech.
@SaifulIslam-fs2li Před 4 lety ⁺⁷
Here CAP means, Consistency, Availability and Partition tolarence
@narsing9 Před 3 lety
For some reason, this comment cracked me up. Ha ha
@akashjkhamkar Před rokem
Hands down the best content on youtube for distributed systems and system design
@stalera Před 4 lety ⁺¹⁴
Its a nice video where you've tried to keep forth all the issues related to distributed architecture, but somehow the issues were unanswered. I was watching till the end hoping you'd address them :)
@kokoreply Před 10 měsíci
my exact thought
@sumitvishwakarma56 Před rokem
Very well explained... searching for this concept for a long time and finally got it!! Thanks for the tutorial!! Appreciate it
@zustaz Před 2 lety
Great explanation!
@rahulchudasama9363 Před 3 lety
Nice explaination...
Waiting for system design like leetcode or hackerank ...
@subee128 Před měsícem
Thanks
@ChandraSekhar-zu9nw Před 4 lety ⁺⁵
Instead of lock_, Can't we just take a combination of lock_name(sent by client) and ipaddress of the node as unique key so that we wouldn't end up with problem mentioned in 17:05? i.e even though I1 wants to release the lock at 6th sec after I2 acquired the lock, it wouldn't be able to as it hasn't found the lock with the combination of lock_name and ip address.
@seetaramrathod2611 Před 5 lety ⁺⁴
How is integrity ensured in below scenario
1. client1(Instance) acquire lock and perform some operation which is taking longer than lock timeout value
2. client2(Instance) acquire same lock after lock timeout value and start updating same data.
@SunnyGuptaTech Před 5 lety ⁺¹
In this case, you need to use some token that will be sent while you are updating some data. That token will ensure that the update will not be inconsistent. martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
@weiluntan8008 Před 2 lety ⁺¹
At around 17:10, you mentioned I2 releases the lock which was previously acquired by I1. Locks are supposed to have ownership, and shouldn't allow what you described. But a (binary) semaphore will achieve a similar use case
@nameredacted6926 Před 5 lety ⁺⁶
1) The unique ID problem is not solved (distributed clock problem)
2) You cannot use a timestamp as an ID in a production system
3) The original problem of dividing work in a mutually exclusive way is not solved
@kayeshparvez Před 5 lety
I agree
@hangsu5724 Před 4 lety ⁺¹
@@kayeshparvez Agreed. Two two entities wants to lock the same thing, but renaming algorithm in this video will return two locks, which can not guarantee mutual exclusion.
@rajeshkishore7171 Před 3 lety
I totally disagree with this video content, not solving real exclusiveness, infact zookeper's lock explanation is better
@gitanshgarg3146 Před 3 lety
Thank you for making such content learning a lot of new stuff from these..🙌
@ayusharora2019 Před 4 lety
Understanding the concept first time. Really helpful !!
@InshuMussu Před 2 lety
Helpful thanks
@naval041 Před 4 lety
Hi Narendra,
Want to understand two points here,
1. If you see at time 20:00, There is an elapsed time of 2 Seconds where the lock was handed over to the I2, If that is the case then what sort information I1 was updating for that 2 Seconds. Do not you think when lock manager itself release the lock then server should get intimation of the same.
2. In Lock Id, We are taking timestamp to create a unique id, Take a scenario where we have multiple copies of the lock manager and server request it to n/2+1 servers to do so, but there will always difference in time within lock manager when that request gets served, If each lock manager takes timestamp at the time of request serve then each lock manager will have different id for the same request.
@yashsingla8356 Před 4 lety
# After adding TTL it looks possible I1 is writing to 1.out file and i2 may start writing as well which may corrupt the 1.out file(from Narendra kumar Gupta's comment )
- Idea of TTL would serve better if after time-out ,Lock Manager not only invalidates the specific lock in the cache but also notifies the instance-1 acquiring the same lock about TTL along with invalidating the lock for instance-1 i.e should stop the instance-1 to process the input file on which lock was acquired and as a result 1.out will not be corrupted(integrity is maintained)
@Manishsharma-tj4nn Před 4 lety ⁺¹
Boss please produce more video's.
@shubhamqweasd Před 4 lety ⁺²
Can we use an implementation (maybe slightly modified) of Paxos, to elect one of the nodes to take the lock ? So the writes are only managed by the elected and no other node is allowed ?
@EwertonSilveiraAuckland Před 2 lety
Man you're so cool
@ravitiwari2160 Před 3 lety
Hey, Thank you so much all your knowledge sharing. I am able to perform very nice in all my interviews. Keep up the good work. More power to you.
Keep rocking!!!
@karthikkb3817 Před 2 měsíci
The first half was good but the second half disappointed. Thanks for other videos, do you have any other video for the same topic?
@sidhantshubham Před rokem
For the timestamp issue for different lock master, the server which wants to acquire the lock can send timestamp + 5 digit random key as parameter and all master lock nodes should use the same key for lock
@sandeepbatchu487 Před 5 lety ⁺⁵
Great work Naren. I learnt a lot from your videos. You mentioned multiple lock managers and the workers need to get (n/2+1) locks. But how can the workers requesting lock know the value of N? Do they need to make additional call to get value of N? If so to which lock manager?
@bowang1825 Před 4 lety
using the majority agreement to avoid single point of failure is a good way. Can you
a) explain whats the main difference from chubby?
b) why you cant key the lock by resource ID but need to bother to create a lock ID by timestapm?
@parwana1000 Před 3 lety ⁺¹
Once I get the lock ,I can even write after 5sec in that file I had got...then what is the meaning of giving lock to another?
@1qwertyuiop1000 Před 3 lety
I have one dobut...what happened to lovely CAP??.. lol.. thanks for sharing information about locks.. :-)
@NikolaMilovic-yo9fr Před rokem ⁺¹
But if the lock expires while the I1 is still thinking that it holds it, I2 can now get the lock to the shared resource and cause race conditions and undefined behavior. This doesn't seem like the solution
@rohanagarwal5512 Před 3 lety
Redlock explained :D But Martin K wont be happy 😜
@andybhat5988 Před 2 lety
Etcd CAS is the best method for distributed lockd
@gabrielpedro8774 Před 2 lety
Thank you
@sachin_getsgoin Před 3 lety
Good description of the problem. But the explanations are not quite there.
There are so many flaws in Unique ID, Clock mismatch (For using time) and the system setup for distributed lock manager.
Time to go back to "Designing Data Intensive Application" - Chapter 7 Transactions (Topic : No Dirty Writes, Two Phase Locking) & Chapter 8.
@YoyoMoneyRing Před 4 měsíci
So, you are saying, I'll use zookeeper to create zookeeper distributed locking. 😆
@TheKievsash Před 2 lety
Very good explanation! But plz use a radio microphone for better sound
@himanshupoddar1395 Před 4 lety
My question is how different is a lock in a distributed system ? Why can't we use locks/mutex in distributed system as we do in case of handling critical section to prevent deadlocks? Moreover a distributed system lock is to prevent files from being accessed by multiple processes at the same time. In this case why can't we use semaphores or any deadlock prevention technique(eg mutex, semaphores, locks) to handle those files. We can treat them as a critical section and allow only one processes at a time to access it.
@alexbur7512 Před 4 lety ⁺¹¹
wow, that's quite an amazing example of how NOT to design distributed locks. So many baseless and simply wrong statements (21:34 "our lock is kind of safe, orrr.... is it safe???" - ROFL).
Also seems like this guy overheard something about split-brain problems and Raft consensus algorithm but the way he applies and explains it here makes zero sense.
@HimanshuSingh_92 Před 4 lety ⁺²
Why don't you help us understand then, what's the good way to design locks?
@vaybhavshawify Před 3 lety
I think rather than having a quorum agreeing to applying and releasing locks, why not have a master-slave architecture? That way, you have a single master node that generates the unique lock id. Moreover, we would have a single point generating the timestamp that would be copied to the slaves. This covers up another avenue where time inconsistencies could happen.
Moreover, I would suggest using a GUID generator.
@yp5387 Před 2 lety
Why make things complicate?
If you are saying it might take hours to process one single file that means that node would be able to process another file after few hours. so now lets assume you would have 10 nodes to process those large files.
1. 10 nodes will come online
2. All of them will read locks from the database and pick one file based on the availability mentioned in the shared database -> (10 writes and 10 reads)
3. 10 nodes will start processing those files and will come back later after few hours to pick new file. (again 10 writes 10 reads)
4. this will get repeat 10 times a day. lets assume some files are smaller. still that is only 100 reads and 100 writes per day. I think this is efficient
@hoangtrunghaipham5999 Před 5 lety ⁺³
Firstly, thank you so much for your really nice video.
Secondly, I am confused at one point. After TTL, if I1 has not finished processing file, and I2 is allowed to process that file also, the data in that file then will be corrupted, won't it?
Could you please clarify it?
@zhouchong90 Před 5 lety ⁺²
When l1 finished processing the file, and prepared to write the output, l1 can check again whether the lock is still hold by l1. If true, it'll first try to extend the ttl of the lock, then save the output, then delete/archive the file and unlock the lock. But if your TTL is not set properly and it's just not enough time for the worker to finish the work, then this solution won't help in any ways.
@VinayRachapalli Před 3 lety
@@zhouchong90 what if l1 acquired the lock to write? it will be writing into the file and TTL will expire and l2 will acquire the lock and starts writing too
@kokoreply Před 10 měsíci
nice video, but I still dont get how it solve sthe original problem of locking the files to avoid having multiple machine processing the same file
@amlanch Před 5 lety ⁺³
Excellent tutorial ! Could you please do one on Zookeeper and Paxos?
@monishchhadwa777 Před 8 měsíci
Anyone else felt like saying..."JOR SE BOL"?
@vamsikrishna_sampangi Před 5 lety ⁺²
Thank you soo much for all efforts you are taking to produce these high quality videos. I have greatly benefited by these. :-)
@manaligadre7809 Před 5 lety ⁺⁴
When you are releasing the lock after 5-sec (coz it is taking more time to finish the job) and now it's available for I2, we got lock issue fixed. But you said I1 finishes the job after 7 sec? then what's the point of having a lock? or on the other hand, if you say I2 is locking the same file, then how I1 finished the job in 7 sec? I1 is still writing in the file and now the file will be corrupt.
@abraham12385 Před 5 lety
When other acquires a lock the system will just stop the process executing in the Ram for the old one to Pageout and store the state in ram , else just close the old one.I am not sure, can anyone correct me
@zhouchong90 Před 5 lety
@@abraham12385 It's a distributed system, one cannot know exactly what step it is in for the process that's processing the file (for example, while it is writing half of the output). Only that machine itself knows.
@zhouchong90 Před 5 lety
What you can do is before you generate output, try to acquire/extend the same lock again. If you are able to do so, then you can write the output. And you don't have to make the lock a Boolean, you can have States, like ToProcess, Processing, Completed.
@AbhishekSingh-op2tr Před 4 lety ⁺¹
Maybe keep the TTL record locally on the machine as well which acquired the lock, so ones the TTL expires, both the machine & the lock manager do the cleanup, solves this problem also saves a network call for the machine to go to lock manager to release the lock. Now the assumption is either the time is in sync, or handle that as well.
@AbhishekSingh-op2tr Před 4 lety ⁺¹
Only go to the lock manager if you need the same lock again. If got, update TTL, otherwise do the cleanup. But this is no different than the original request.
@jiguification Před 5 lety
Assuming there are 5 nodes and we acquire lock from 3 nodes. consider if 2 nodes which took note of lock goes down, then we are left with 1 node with lock and 2 nodes without lock, which would indicate a resource is not locked as majority wins. Is it not an issues?
@paulpayne3555 Před 3 lety
I'd just change the filename. First process to successfully change the filename gets to process the file. In other words use an atomic rename on your resource server as your locking mechanism.
@adamhughes9938 Před 4 lety
A good service is also consul by hashicorp that has a nice lockign API
@fictionstudios6876 Před 5 lety
Sir I have a doubt for long time.what is tech stack behind the Indian internet banking technologies such as upi and imps.How secure they are?Does they use Java for backend (server side) and security?What database does they use?Does they use any framework? Whether they use cloud service or own server?
Please sir .It will be helpful for clearing many beginners doubt on security.If it will be long,You can make a video on it.
I think you will be the best to answer my question.
@TechDummiesNarendraL Před 5 lety
Sure, I will try to get that info
@shivamaggarwal152 Před 3 lety
I worked on a similar project in the past(around 4 years back) when we handled the backend APIs for banking transactions. Most of the services were made on Java, the database varied from MySQL to Oracle(I didn't see using NoSQL anywhere during my time). The applications were managed by the Bank's infrastructure and were not using any cloud at that time. The servers were huge and could be vertically scaled.
@dillon9347 Před 2 lety
Could we use something like consistent hashing instead of replicating locks on x number of machines randomly?
@wy100101 Před 4 lety
You should just zookeeper because it solves the quorum for you. Having a TTL is the right call, but you need to implement fencing to stop a process that has an expired lock totally wrecking the system and defeating the purpose. In the end, you will have problems unless either the datastore implements the locking directly, or it has a mechanism to take a client provided lock and validate it against the distributed lock service.
Also, using time is a TERRIBLE idea for a few reasons:
- keeping multiple servers in sync a nanosecond level is a nightmare.
- even given that you solve that problem, you can still have conflicts in high throughput systems.
@afrozalam5389 Před 3 lety
1. americas write to America_1.out
2. Asia writes to Asia_1.out
3. america symlinks 1.out -> America_1.out
4. asia symlinks 1.out -> Asia_1.out
no file corruption if you follow this pattern
@GirishDeshpande Před 4 lety
Boss, you rock! The flow in your videos is really digestive to viewers. Thanks!
@MdZiaulHaqueOlive Před 2 lety
at 8:32, why would single point failure cause integrity problem? in that case other processes won't able to get any lock since the lock manager is down, hence no one writes to any output file. The integrity of system is maintained, right?
@santoshprakash9737 Před 2 lety
What will happen if Lock Manager not able to connect cache or or cache is down. How we can handlethis stuation ?
@rgupta608 Před 4 lety
In Lock Manager case if we have 6 machines so at least 4 locks are required and 2 machines go down, in that case, how can we ensure that machines which went down are the ones who had the locks(Will we check for 4 locks or 3 locks). If machine count is 5 den we need 3 locks somehow if we add 2 more machines what's behavior in that case. Do we have to discard data and process it again?
@anubhamandal Před 5 lety
hi, thanks for the videos, these are always so easy to follow. Can you also please give a session on payment systems such as square cash, venmo or paypal. thanks :)
@pradeepkaravadi5360 Před 5 lety
Please add video on redbus and vlc player
@jordixboy Před rokem
But then you can't automatically scale the lock service. What if first you have 2 lock services, to acquire a lock you need at least 1 approval, you get it. Now you scale yo 8 lock services, you want to acquire a lock, now you need consensus of 5 services, you get an approval cause you ask 5 services and none of them are the initial first 2 services which actually hold the lock
@stalera Před 3 lety ⁺²
Well explained, but I only feel that the explanation was pretty slow and lengthy. If you could pace up your explanation and ignore something that is obvious, the session would get much more interesting. Thanks for hearing out :)
@klavier0x99 Před 3 lety
Do you have book suggestions to learn design systems ?
@somakkamos Před 2 lety
i am still not convinced using a database with acid guarantees is not a better solution than what was suggested.
@AshuKumar-nv9zc Před 5 lety ⁺¹
Sir , can u plzz explain the system design of fitbit
@TechDummiesNarendraL Před 5 lety ⁺²
Sure some time.
@RajeshYadav-ly4gq Před 2 lety
once the time period TTL of 5 sec is elapsed and Instance 1 is still working on file1.data but cache entry is removed from cache manager and Instance2 has requested lock on file1.data and it gets the lock. In this time period that means now both instance 1 and instance 2 is working on file1.data file so again we have same issue integrity and efficiency. How to handle this ?
@ravindrabhatt Před 2 lety
At 20.14 How are you ensuring Data corruption is not going to happen as I1 is still working on the same doc or object as I2?
@kabooby0 Před 3 lety
I like the way you present, but I guess there are some holes in the design. Also using a timestamp seems unnecessary.
@MrTutu143 Před 5 lety ⁺¹
Why did you erase the green lock on the board.. It was beautiful..
.
.
.
.
Nice explanation BTW
@berlynmenaka1509 Před 5 lety
why do we need lock_timeInMillis ? we can just use the hash of the ip address of the requesting instance for uniquely identifying the requesting instance
@vamsikrishna_sampangi Před 5 lety
For one particular resource we can make only one instance as master instance so as to reduce the n/2+1 consent requirement
@TechDummiesNarendraL Před 5 lety ⁺¹
But what happens when the master goes down?
@vamsikrishna_sampangi Před 5 lety
When an instance goes down, all the resources mastered on that instance will be remastered to other live instances. Such information is present in the resource table at all the instances
@TechDummiesNarendraL Před 5 lety
@@vamsikrishna_sampangi if a machine dies because of power failure, all the unreplicated data is lost. There is no way you can remaster. The new master will have fresh data. Old data is gone!!
@amitkshuklaro Před 2 lety
Why haven't you put on your cap today?
@avinashyadagere4744 Před 5 lety
Why do we need to hold the lock till the complete file is processed? My understanding is that we only need to ensure that different processes don't pick the same log file. So wouldn't it be better to hold a lock only till we make an entry to some distributed cache (say filename - log1) and then release the lock immediately ? So when other process comes it sees the cache and finds out that file log1 is already picked and goes for other file. Do you see any flaw here?
@mukulchakravarty9381 Před 5 lety
Avinash Yadagere I guess its because you can use this locking procedure anywhere in your system. Different components will use the locking system for different purposes. If you tightly couple it with filename then it will bw difficult to use it elsewhere. Also we need to make sure that the client should not hold on to the file indefinitely.
@xfabiosoft Před 5 lety ⁺²
Amazing as always! I'm learning so much with your videos... have you something about IFTTT and its architecture?
@chickentikkasauce1301 Před 5 lety ⁺¹
Timestamps aren’t a great solution due to clock skew. Use at your own peril.
@ashivyas420 Před 4 lety
KL Rahul is teaching distributed locks and Naren is playing IPL :-)
@anchaldubey4217 Před 5 lety
Could you please make a video on chess game design
@jcflorezr Před 5 lety
Would not apply in these kind of situations to implement an actor which regulates the file accesses?
@RajeshYadav-ly4gq Před 2 lety
Please write down all references that you haver referred to create this video ..
@pratikdandare9865 Před 5 lety
can you help me, to design system architecture of elastic search with any database
@pbhatsgw Před 2 lety
Thank you Narendra. always great to hear and learn from you. I do have a question- what if instance 1 is working on some task and it takes more than 5 seconds(TTL) so LM releases the lock1, instance 2 comes in and gets lock2 assigned. for this scenario I understand that the lock1 is released by LM but how does that acknowledgement work? how does LM notify instance 1 to stop work, if any, it's doing? do we use TCP/ WebSockets/ UDP for the latter acknowledgement part?
@kevinyu1552 Před 5 lety
a question, how does it prevent issues related to clock skew by using millisecond in the lock ? my understanding is that we should always use proper fencing mechanism instead.
@y5it056 Před 5 lety
If possible, its better to create a unique id using a property of the requestor that itself is unique along with the timestamp
@prasad4you57s2 Před 5 lety ⁺¹
Bro, can please explain irctc system design
@TechDummiesNarendraL Před 5 lety
Sure, sometime
@prasad4you57s2 Před 5 lety
@@TechDummiesNarendraL Thank you man
@satindersingh6380 Před 5 lety
Narendra please make a video on search engine
@TechDummiesNarendraL Před 5 lety
Sure
@gauravsharma8642 Před 4 lety
So used to see him in a cap, my attention keep going on this head for first 5-10 minutes of this video.
@abrarisme Před 5 lety
Hype, another video
@howellPan Před 5 lety
1. how would I1 even know when to call lock manager to extend the lock? and with multiple lock managers, I1 will need to call all (or majority) LM to extend time?
2. I don't think we can depend on synchronized clock between all the LMs, right? using time tolerance seems to be error-prone.. why not a GUID generator (that's also fault tolerant?)
@howellPan Před 5 lety ⁺¹
Uber's solution, continuous heart beat from the client to lock manager.. if the heart beat stops, the lock manager can assume client is dead and release the lock. Otherwise lock manager use algorithm to extend the lock.
@howellPan Před 5 lety
Uber's solution, continuous heart beat from the client to lock manager.. if the heart beat stops, the lock manager can assume client is dead and release the lock. Otherwise lock manager use algorithm to extend the lock.
@ibrahimshaikh3642 Před 5 lety
Very informative, keep it up
@lmxqlmxq Před 5 lety
Great video!
@billyliul8778 Před 5 lety
Sir great video compared with such others! I do have few questions:
1. Suppose we have 5 LM nodes, and 3 of them are down. I1 acquires a lock and 2 LM return the lock, and this lock will be considered as a valid lock since there are two active lm (2 >=2 / 2 + 1). And later the 3 LM nodes come online again, and I2 acquires the same lock, and the other 3 LM will return the lock and this lock will be also valid since now 5 active lm (3 >= 5 / 2 + 1). So there will be two locks available in the system!
2. You mentioned you will check how many LM are available, how can you get the active LM in the system? you can send periodically ping or heartbeat, but the status can be changed when you acquire the lock, so you will get an unreliable active LM count.
3. Same question as @Kapil Karnwal, I1 got a lock for 5 seconds, I1 MUST stop working and renew the lock before the expire time but there may few things happening: a. I1 knows the lock is going to expire in 5 seconds, but the system clock on LM and I1 will be sightly different (NTP), and how do I1 exactly knows when to renew the lock? (A straightforward/safe way is stop working and renew the lock at 4, but you know this will slow down the whole system by 20%, as for every lock, it only works in 4 seconds). b. I1 issued a shared storage write request on second 4 (since the lock is still valid, so I1 can still perform long running task), but the write request takes 3 seconds to complete on the remote server, and during this I2 acquires the lock and issue the same write request, and the data on the shared storage will be corrupted?
@zhouchong90 Před 5 lety
Not really. By this design, when 3 of nodes are down, you won't get the lock. Because for the clients, you won't really know how many lock server are alive. That's also not the client's responsibility. All you need to know is that in total, by design, there're 5 servers. When 3 of them gets down, you'll just receive 3 failures to acquire locks. Of course, In this case the lock system is down. That's why you need to at least deploy 3 servers. In your case, the likelihood for 3 servers in different regions to go down is quite low, if you're using mature cloud providers.
@rishabhgoel1877 Před 5 lety
Nice video.. Can we use UUID instead of time_millisecond ?
@maksimmikheev5896 Před 5 lety ⁺²
I think UUID does it even better than the timestamp. You can generate it on a client so you don't need to memorize every name of locks in different locks managers.
@somil47 Před 4 lety
Acquiring multiple locks from LM will lead to race conditions again.
@rupayansamsung Před 5 lety
thanks for all your uploads, it really helped me in understanding distributed design. Can you please do a session on payment gateway like PayPal ?
@sumonmal009 Před 3 lety
THIS COMMENT IS FOR MY PERSONAL REFERENCE. TO UNDERSTAND PROPERLY WATCH THE FULL VIDEO
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
why need distributed lock 6:51
Solution of single-point failure 9:07
eventual consistency 10:51 13:29
spinning lock 15:01
TTL 17:05
Distribution idea 22:45
Majority lock (final solution of distributed lock) 24:25
@atlasplato5111 Před 3 lety ⁺²
As a engineer@google, i would say this is garbage
@brijeshgupta.official Před 5 lety
What if two machines get the majority number of locks?
@LovyGupta007 Před 5 lety
I don't think it is possible with the same lock. Can you specify a scenario when this case happens?
@IbnIbrahem Před 4 lety
I'm sorry but too many flaws with the design.
1- Adding TTL allows a window for corruption, instance 1 takes lock, after TTL, instance 2 takes lock (while instance 1 still holding it) => race condition
2- Using epoch in sync between multiple machines is impossible, and using some tolerance defeats the point of using epoch in the first place.
3- What happens to machines that dies after they come back? how do they catch up and become healthy again?
@murali1790able Před 2 lety
time waste, you have Lock manager but you still want client machine to make 5 calls to acquire lock. why should client make 5 calls?
@amitsing Před 5 lety
Great work mate !
Question:
1. @czcams.com/video/v7x75aN9liM/video.html : If instance1 was processing and took more than 5 seconds (TTL elapsed) when instance 2 picked up processing, wont it lead to corruption ? Same argument was given in earlier case.
@chengwl2008 Před 3 lety ⁺¹
Full of errors, waste my 28mins

Další v pořadí

Automatické přehrávání

Google Docs System design | Part 1| Operational transformation | differentail synchronisation