One of the standard metrics for measuring the performance of our incident management practice has always been mean time to repair (MTTR). But I wonder if this metric really gives you actionable information—information that will help you provide a better service to your business.
Think of it this way: Do your customers really care how long it takes you to resolve a low priority-incident that may be impacting one person, or that may have a perfectly acceptable workaround? I would venture that the answer is ‘no’. Business users care about critical incidents, those that impact their ability to serve their customers and impact the bottom line for business profits or reputation.
In fact, I find that another metric, the critical incident response time (CIRT), is a far better way to measure performance.
Mean time to repair reporting generally looks at all incidents that come to the service desk, giving the average time it takes to resolve these. This number will include:
In some organizations, your MTTR number also includes automatically logged events that are resolved in seconds.
Another point to remember: MTTR only looks at the incidents that have been resolved; it gives no recognition to long standing incidents that are languishing in your queue. These long-standing incidents artificially skew metrics upon resolution.
In my opinion, all this extra noise makes MTTR virtually meaningless. It makes little sense to provide this metric to the business, as its neither actionable nor relevant. If you provide to the business an MTTR measurement that includes non-concerning (to them) incidents, you’re giving an inaccurate impression of your incident practice performance.
Critical incident response time (CIRT) is a far more accurate way to evaluate how well your incident practice is performing because it concentrates on incidents that have actual impact on the business.
CIRT is interested in how long it takes for you to respond to business-impacting incidents, rather than how long it takes for you to resolve them. CIRT measurements remove the extraneous noise from your reporting. This can be accomplished by taking a few, relatively easy steps:
Applying these rules to the incidents that are included in your CIRT measurements ensures that you:
The important measures for business-critical incidents relate to how long it takes to identify, locate, engage, and initiate your response team. The quicker this can be done, the better.
Unfortunately, a recent report from Fintech Futures indicates that it takes, on average, over an hour for companies to assemble all the appropriate resources to start work on a critical incident. Only 30% were able to assemble the personnel and resources in less than an hour.
These numbers matter to the business. Ultimately, reducing response times will improve your downtime statistics and improve customer perception of your critical incident processes.
Rather than giving an average time to resolve for these incidents, which varies dramatically due to complex IT environments, it is more important to measure process efficiency by examining how long it takes for your team to initiate response. You have complete control over this response time, and you can take actionable steps to improve it.
Historically ITSM professionals love reports. We like to see how well we are doing, we want to measure improvements, and we must present our accomplishments to our management. Too often, however, we grab the low-hanging fruit of reporting options, using these metrics to justify our value to the business.
But, a word of caution: before you change what you report to the business, sit down and talk to the business units. Ensure that you understand what is truly important for them to understand. Don’t simply send them reports to make yourself feel better! Don’t report on something that you aren’t going to action or have no ability to change. If your report won’t lead to improvement, it is pointless. People are wasting time: your effort on producing the report, and their time understanding it.
My view is that the CIRT report has far greater value to the business than a standard MTTR report but, first, talk to your business. Explain what this new report tells them. If, and only if, they agree they would find value in it, you can start reporting this to them.
Once you do start tracking CIRT, make sure you have a plan to improve your metrics. How can you bring your teams together faster? Where are the bottlenecks in the process? What isn’t working. Always use your reporting to create real improvements, not just a token you send to the business each month.