Video není dostupné.
Omlouváme se.

What is MTTR ? - Key Incident Recovery Metrics to Reduce Downtime

Sdílet
Vložit
  • čas přidán 14. 08. 2024
  • What is MTTR?
    In this video, we cover the key incident recovery metrics you need to reduce downtime.
    Learn more at pagertree.com/...
    This video was brought to you by PagerTree. On-Call. Simplified.
    Start your free trial today at pagertree.com
    ***************************************************************
    Transcript:
    Uptime - MTTR - MTTA - and MTBF
    So many acronyms! But it all starts with downtime.
    Downtime is time your solutions are out of action and unavailable for use.
    It’s costly to fix, and ultimately breaks customer trust. Simply put, downtime is expensive!
    So let’s take a look at how these key metrics help us manage downtime.
    The first metric is Uptime.
    Uptime is the % of time which a company’s solutions are in action and available for use.
    It’s calculated as “Total time less downtime divided by total time”
    It’s a measure of availability, and in today’s connected world, you want to target at least 99.99% uptime.
    Next, we have MTTR.
    Mean time to resolution is the average time it takes to resolve an outage and restore services to end-users.
    It’s calculated as “Total downtime divided by # of incidents”
    MTTR is a measure of resilience. Incidents are bound to happen, but healthy systems bounce back quickly.
    Our next metric is MTTA.
    Mean Time to Acknowledge is the average time it takes for a new incident to be acknowledged.
    It’s calculated as “Total time to acknowledge divided by # of incidents”
    MTTA is a measure of responsiveness. The faster you respond, the sooner you can start working towards a solution.
    Lastly, we have MTBF.
    Mean Time Between Failures is the average time from one incident to the next.
    It’s calculated as “Total Time less downtime divided by the # of incidents”
    MTBF is a measure of reliability, and answers the question: How often do our systems go down?
    Now that we’ve covered our key metrics, let’s test our knowledge with some real numbers.
    Over a 30 day period, you have 5 incidents, 10 hours of total downtime, and 180 total minutes to acknowledge.
    Can you calculate your uptime, MTTR, MTTA, and MTBF?
    Let’s start with uptime.
    Remember, uptime is calculated as “total time less downtime divided by total time”.
    In this case, we have 720 hours less 10 hours of downtime divided by 720 hours.
    This gives us an uptime of 98.61%
    Ouch! That translates to 121.8 hours of downtime each year! Not good at all.
    We need to drill in on the other metrics to see where we can improve.
    Our mean time to resolution is 10 hours of downtime divided by 5 incidents, which gives us an MTTR of 2 hours per incident.
    That’s not horrible, but it’s not great either. We’ll want to break this down by incident severity in order to better understand what’s going on. Are we consistently hovering at 2 hours, or do we have some wild variations in time to resolution?
    Next, let’s take a look at mean time to acknowledge. Our MTTA is 180 minutes divided by 5 incidents for an average of 36 minutes per incident.
    That’s way too slow, and there’s some low-hanging fruit here. Speeding up response times can be the quickest way to improve MTTR and reduce downtime.
    Lastly, our mean time between failures is calculated in the following manner: 720 total hours less 10 hours of downtime divided by 5 incidents.
    This gives us an MTBF of 142 hours. That translates to more than 1 outage a week! We’ll definitely want to look at our logged incident trends and see if we can’t identify some root causes.
    So there you have it. Uptime, MTTR, MTTA, and MTBF. These four key metrics keep a pulse on your IT operations, and help you ask the right questions to take action.
    Start monitoring these key metrics today and get ahead of downtime!

Komentáře • 15

  • @theweegit
    @theweegit Před 5 měsíci

    First metric equation: total time - downtime % total time, voiceover: total time plus downtime % total time, good start

  • @SagarRane-cf1hr
    @SagarRane-cf1hr Před 6 měsíci

    very nice video to understand the parameter easy

  • @DeangeloHaggen
    @DeangeloHaggen Před dnem

    How did we get 121.8 hours of downtime each year at the 2:10 timestamp?

  • @hercules1345
    @hercules1345 Před rokem

    Very informative video. Thank you.

  • @renegmuniz
    @renegmuniz Před 4 lety +1

    This is great! I love it! but what about MTTD (Mean time to Detection)? It affects MTTR, the sooner you detected the sooner you can start working on it and have a better MTTR

  • @asifsiddiqui5794
    @asifsiddiqui5794 Před 3 lety

    Good information, thanks!

  • @unkownwalker8853
    @unkownwalker8853 Před 5 měsíci

    Very informative but the problem the problem they fix isnt easy and the personnel is shortage , it affects their evaluation

  • @risav4u
    @risav4u Před 3 lety

    Very crisp

  • @philneromelcuizon8170
    @philneromelcuizon8170 Před 3 lety

    What are the basis of this formula for the uptime and etc?

  • @shawkattuffaha7318
    @shawkattuffaha7318 Před 6 měsíci

    How did you translate 142 hours to 1 outage per week

  • @RahulKumar-je3mg
    @RahulKumar-je3mg Před 5 lety +1

    Thanks for that quick information.

  • @dineshdilli
    @dineshdilli Před 4 lety

    Thank you

  • @happsong
    @happsong Před 3 lety +1

    where you get 720?

  • @isoexperiencebytideerach8676

    Thank you.