Distributed Systems 3.1: Physical time

HA vs Fault Tolerance | How Swiggy handles Faults in Microservices? | Tech Primers

Distributed Systems 5.1: Replication

3.5M❤️ #thankyou #shorts

Mellstroy Vs Mrbeast - Who Will Win The Picture Completion Game? 🤔

SIUUUUU 😳 At 39 years old, Cristiano Ronaldo 🇵🇹 still makes football look easy 🔥

Distributed Systems 2.4: Fault tolerance

Martin Kleppmann

zhlédnutí 37 463

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 27. 07. 2024
Accompanying lecture notes: www.cl.cam.ac.uk/teaching/212...
Full lecture series: • Distributed Systems le...
This video is part of an 8-lecture series on distributed systems, given as part of the undergraduate computer science course at the University of Cambridge. It is preceded by an 8-lecture course on concurrent systems for which videos are not publicly available, but slides can be found on the course web page: www.cl.cam.ac.uk/teaching/212...

Komentáře • 12

@eyadkhayat Před 8 měsíci
Watching this course while brushing up my system design skills. Very useful. Thank you
@bermick Před měsícem
brilliant! thanks a lot for the content Martin!
@dmytrozaporizkiy3599 Před 2 lety
Brilliant!
@ascyrax8507 Před 2 lety
nice content. thanks a lot.
@mantistoboggan537 Před 3 lety ⁺⁵
So wait, how does the eventual failure detection get implemented? Don't we still fundamentally have the same problem if we have asynchronous timings? How would I know that my node has failed, as opposed to just going through a huge garbage collection protocol, or thrashing, or anything else?
@AZAssazin Před 3 lety ⁺⁵
I think the idea is that *eventually* may mean a very long time, e.g. if you don't get a response in a few weeks, the node crashed. Alternatively, you could probably enforce (maybe via an SLA) what a failed node will look like, especially if the service you're calling is another service your company owns. "If we don't respond within 1 minute, then even if we were just stalled due to garbage collection, we'll discard the message and consider the node faulty."
@yogeshedekar6078 Před 3 lety ⁺⁴
You can simply have a heartbeat signal sent to every node usually called as a liveness probe in cloud terminology. If the node does not reply to heart beat say 3 times consecutively you know that the node has failed and can trigger an automatic restart. If restart also does not fix the issue then you take that node out of rotation and put another node in place.
@allyourcode Před 2 lety ⁺³
I think the answer is in the title of the slide: a PARTIALLY SYNCHRONOUS model is being considered, not async.
@kleppmann Před 2 lety ⁺¹⁷
That's exactly the point: if you don't get a reply from some node within some timeout, it might be that the node crashed, but it could also be that the node or the network is just temporarily being slow. And we can't definitively distinguish between crash and slowness. However, if slowness is only temporary, then eventually the node will start responding again if it's not crashed. The problem is that in an asynchronous or partially synchronous system, we don't know how long that might take.
@sarathkumarmutnuru1177 Před 2 lety
at 6:51, how can any fault detector label a node as correct if it crashed actually? Since, fault detector labels correct only if it receives an acknowledgment of some sort, so there is no way a crashed node can acknowledge.
Unless, the node has crashed in between the signal trigger intervals of the fault detector.
@khaldrogo9451 Před 2 lety ⁺¹
Well one example is to think of the time in between messages being passed. A sends a message to B, asking if B is still up. B responds by saying "yes, I'm good", and crashes right away. Now, A will get a message saying that B is up, but in reality B has actually crashed. So, until A goes around and asks B for its status again, it will never know and will have marked it as correct.
@GooseBerry390 Před rokem ⁺¹
@@khaldrogo9451 Excellent response. Note that there is the timeout period itself as well, so even after A has asked B, it will wait for a particular length of time until it decides that a timeout has actually occurred.

Další v pořadí

Automatické přehrávání

Distributed Systems 3.1: Physical time

Distributed Systems 3.1: Physical time

HA vs Fault Tolerance | How Swiggy handles Faults in Microservices? | Tech Primers

HA vs Fault Tolerance | How Swiggy handles Faults in Microservices? | Tech Primers

Distributed Systems 5.1: Replication

Distributed Systems 5.1: Replication

3.5M❤️ #thankyou #shorts

3.5M❤️ #thankyou #shorts

Mellstroy Vs Mrbeast - Who Will Win The Picture Completion Game? 🤔

Mellstroy Vs Mrbeast - Who Will Win The Picture Completion Game? 🤔

SIUUUUU 😳 At 39 years old, Cristiano Ronaldo 🇵🇹 still makes football look easy 🔥

SIUUUUU 😳 At 39 years old, Cristiano Ronaldo 🇵🇹 still makes football look easy 🔥

Ochutnáváme Nejsmradlavější Vajíčka na světě - Stoleté Vejce @Duklock @Vidrail

Ochutnáváme Nejsmradlavější Vajíčka na světě - Stoleté Vejce @Duklock @Vidrail

8.1 Fault Tolerance

8.1 Fault Tolerance

Distributed Systems 4.2: Broadcast ordering

Distributed Systems 4.2: Broadcast ordering

Design a Fault Tolerant E-commerce System | System Design

Design a Fault Tolerant E-commerce System | System Design

10 years of embedded coding in 10 minutes

10 years of embedded coding in 10 minutes

Distributed Systems 4.1: Logical time

Distributed Systems 4.1: Logical time

Design a High-Throughput Logging System | System Design

Design a High-Throughput Logging System | System Design

Choosing a Database for Systems Design: All you need to know in one video

Choosing a Database for Systems Design: All you need to know in one video

Lamport on writing "Time, Clocks, and the Ordering of Events in a Distributed System"

Lamport on writing "Time, Clocks, and the Ordering of Events in a Distributed System"

Logan Paul Fails To Fool Kai Cenat In Mr Beast Video

Logan Paul Fails To Fool Kai Cenat In Mr Beast Video

EURO 2024 Byl NEJNUDNĚJŠÍ Turnaj ve FOTBALE…

EURO 2024 Byl NEJNUDNĚJŠÍ Turnaj ve FOTBALE…

PŘEŽIL JSEM NOC V NEJLEVNĚJŠÍM HOTELU! (5KČ)

PŘEŽIL JSEM NOC V NEJLEVNĚJŠÍM HOTELU! (5KČ)

Ráno po jednorázovke

Ráno po jednorázovke

Double Stacked Pizza @Lionfield @ChefRush

Double Stacked Pizza @Lionfield @ChefRush

Cool Items! New Gadgets, Smart Appliances 🌟 By 123 GO! House

Cool Items! New Gadgets, Smart Appliances 🌟 By 123 GO! House

3.5M❤️ #thankyou #shorts

3.5M❤️ #thankyou #shorts

Smart Sigma Kid #funny #sigma #comedy

Smart Sigma Kid #funny #sigma #comedy