When it comes to business disruptions, the adage is only the paranoid survive. Which of course leads one to ponder: when it comes to disaster recovery, are we covered?
That’s the question tackled by a recent webinar hosted by ITWC and sponsored by Rogers. Hosted by ITWC CIO Jim Love, the webinar offered a deep dive on disaster recovery and offered tips and advice for IT professionals to ensure they have a solid plan in place developed hand in hand with line of business leaders. (Click here for the complete webinar replay)
While disaster recovery is the commonly used term in the industry, Love said he actually prefers the term business resumption. What’s really important isn’t just getting the servers back up and the computers back on, but getting employees back to work and business being conducted once more. That goes beyond technology, and beyond just the IT department.
“The difference between disaster recovery and business resumption is in the little details,” said Love.
The importance of business resumption is underscored when you consider just how long your business can last, and how long your customers will wait, if you’re not up and operating. A 2012 study by the Business Continuity Institute found that 74 per cent of organizations view unplanned IT and telecom outages as the top threat to their business.
And when asked how much downtime their business could sustain, most respondents – 40 per cent – said just one to two days. And 22 per cent could only last hours. Customers aren’t going to be patient.
“Customers just won’t wait that long. It’s that simple. Even those polite old Canadian customers aren’t going to take what they did in the past,” said Love. “Someone is just a click away from taking that customer from you, and someone is already spending lots of money trying to bring them over. A disaster creates that opportunity.”
Even if you’re only down for minutes, Love said customers will leave you, and in a social media world they’re not afraid to let everyone know. So it’s not really how long you can survive an outage, but how long customers will wait. It’s those questions that should inform your business resumption planning.
We can now put a cost on downtime. An Aberdeen Group study pegged the cost of downtime at $700,000 per hour for large companies, and even small companies lost over $8,000 per hour they were down. When you’re trying to get buy-in from the business for disaster recovery planning, helping them understand the cost of downtime is the first step.
“People are optimistic by nature and we suck at risk assessment,” said Love. “Our minds just don’t work well around risk assessment and problems down the road.”
The list of what could go wrong is a long one, from extreme weather and geological disasters to cyber-attacks, vandalism and even human error. In Canada alone, we’ve seen the forest fires this year in Western Canada cause business disruption, and the Toronto ice storm in 2013 is still fresh in the minds of many.
And it does seem to be on the agenda for CIOs. When asked their top spending priorities for the year ahead, 49 per cent of technology leaders had disaster recovery as a priority, and 59 per cent cited the broader category of risk, compliance and security.
“If you’re a CIO and this isn’t on your list you need to ask why not,” said Love. “If you think it’s handled and you don’t have hard data to prove it, you’re taking a pretty big risk.”
What’s in a plan?
A disaster recovery plan focuses on protecting and recovering the technology stack and is driven by three key metrics: the recovery time objective (RTO), recovery point objective (RPO) and business impact analysis (BIA). RPO is the answer to the question how much data is at risk, and RTO is when you want to recover.
“Look at the maximum tolerable downtime and ask when do we get to serious damage, and try to figure out what your recovery time should be. Your RTO should allow a lot of time between its execution and your maximum tolerable downtime,” said Love. “You need some breathing room because things go wrong. This is more than just a backup. You need to recover your data, possibly your infrastructure, and your apps, and test it.”
BIA recognises that there’s more to this than just disaster recovery – that’s just the first step. If your systems are up but people aren’t working, you need more steps in place. For starters, you need an up to date crisis communications plan with all the information you need to contact everyone and get them working again – and it needs to exist somewhere other than the system that went down.
“The real test is can you operate your critical business functions, and if you don’t know what those are you need to find out,” said Love. “You need support from the top, and strategies to authorize, enable and encourage regular updates and testing of your plan. And it’s not a documentation exercise; it’s an engagement exercise.”
Something else to consider is whether or not cloud computing and the hybrid enterprise cloud should be part of your business resumption planning. Cloud has made recovery ubiquitous and cheap, but Love said you still have to think about it, test it, and make sure if works. Check service agreements carefully, and remember that just because you have snapshots doesn’t mean you can roll back.
It’s also important to remember that disaster recovery is a process that you need to execute and test regularly. Love gives his team scenarios often, telling them to pretend a system just went down and to see how quickly they can recover it, documenting the process and how long it takes.
“If people expect it at any time they stay sharp,” said Love. “Testing without lessons learned is a waste of time.”