Robert Parker has an interesting way to get companies to break the ice around the weighty discussion of disaster recovery. Parker, a partner with Deloitte’s enterprise risk assessment division in Toronto, suggests taking a collection of newspaper clippings documenting natural and man-made disasters
and putting them in front of business unit leaders with the following question:
“”What would we do if that happened to us?””
“”You could have an outbreak of flu… things in which the employees can’t come to work such as with the SARS (Severe Acute Respiratory Syndrome) outbreak of last winter. It refocuses the subject because when you are harping on the business continuity or disaster recovery message time and time again people get tired of hearing it. But if you get them to go around the room and say, ‘What would the impact be to us if this happened and are we ready to address it?’ the discussion picks up.””
Good governance of the plan is critical
According to a study from Contingency Planning Research, power outage and surges are the leading cause of computer downtime at 31 per cent, followed by storm damage at 20 per cent, floods and burst pipes at 16 per cent and fires and bombs at nine per cent.
During SARS, Parker says a number of companies, including a financial services trading organization, split their critical divisions into two physical locations: IT in two physical locations and the trading division into two physical locations along with the people who support them. “”The people supporting those groups were also split into two separate areas and if one got quarantined the other half were available to work.””
Governance is also a critical issue around DR, says Parker. It should include not only technology staff but also the key
business units, “”because that’s why you’re in business. Yes, you need technology to support them but the business has to operate. What you don’t want is to bring the technology up but not have the business ready or vice versa. Or if the business is waiting for applications A, B or C and you’re bringing up entirely different applications.””
And while 9/11 was a wake-up call for many companies in the fall of 2001, incidents such as the power blackout of last August and pending concerns about terror incidents this summer have really given organizations cause to dust off their current DR plans.
Shivering through recovery
Cold recovery, which takes place over days and weeks, may be suitable for a small company that is not dependent on the Internet.
Cold recovery involves building systems after the disaster and acquiring the necessary hardware after the fact. Warm recovery occurs hours or days after an event and requires building a system prior to the disaster but does not have live data on it. Data is stored offsite and brought online and point customers to the new system.
Hot recovery, or instant retrieval of systems is the solution that provides recovery minutes or seconds between a system failure and recovery.
Afilias, a domain-name registry with facilities in Toronto, Philadelphia, Germany, London and Ireland, contracts with Q9 Networks for disaster recovery services. Like many organizations, it began considering the impact of larger scale disaster recovery issues after 9/11.
“”Most DR plans I think are limited to local geographical disasters or issues. People don’t necessarily think ‘Wow, What if an office building isn’t available to me for six months? How do I continue my business?”” says Michael Young, director of IT with Afilias in Toronto.
Afilias provides advanced back-end domain name registry services and a wide range of advanced capabilities essential to the smooth and efficient operation of any Internet domain name registry.
Afilias launched its registry service in July 2001 with, .info. Today, the company provides runs the back-end systems and registry services for .info, .org and several country code TLDs (ccTLDs), together supporting more than four million domain names.
Afilias is required to document and present DR plans through its regulatory boards. “”We have a very transparent approach to that,”” says Young. “”We’re a heavily regulated industry in that sense.””
“”Most people think about scenarios such as ‘If I have a major server failure do I have good backups? Are the backups offsite? But it’s very hard for people to conceptualize when they’ve been in a building for 30 years that they may not be able to enter the building one day,”” says Young. “”Subtle disasters are also not considered such as the health department shutting down a building for mould.””
Afilias found its DR plan put to the test during last summer’s blackout. The company has its technical support centre in Toronto and other systems in New Jersey.
“”I think one of the lessons we took away from the power blackout was that while we were quite pleased with the way our DR plan did work — it operated as our dry runs indicated it would and was considered successful — but DR is an evolution of finer details and Afilias has a position in the marketplace of being reliable, stable and fast-performing registry system. As a result we feel secondary services such as 24-hour technical support should be treated with the same type of urgency and concerns.””
For Afilias, phone systems became a clearly weak link during the blackout.
“”PBX-based phone systems on dedicated T1s are always a concern when you have a large area without power for an extended period of time. Fortunately our providers at the time did maintain backup generation but it leaves you with the consideration that your vendors must be as capable and committed to DR as you are.
Keeping the business going
“”One of the things we did following (the blackout) was make the decision to move to an IP phone system and the phone system no longer lives on site at our operation centre but located in a Telus data centre because they are the telco that provides that service for us,”” says Young.
Planning for DR should also involve taking a critical look at the requirements of your business to continue effectively in the event of an emergency situation.
“”I think sitting down and considering the nature of your business and what services and products you offer that are time-sensitive, that are mission-critical to your business and customers and be careful in that assessment. There are a lot of services people do not think of in terms of their disaster recovery plan. For example, really thinking through phone systems and communications when power is out is a big thing,”” said Young.
There are varying degrees of what is referred to as disaster recovery depending on the urgency of the business. More companies are choosing to go to hot recovery with service providers such as Q9 Networks which provides support to companies such as Afilias. Hot recovery is two identical systems built prior to disaster both in sync in terms of the data concerned and the applications.
“”If you design it properly you can really split the infrastructure rather than replicating it,”” says Arafat.
Afilias approaches DR plans based on a joint capability centre.
“”With a data centre we may use it as a Tier 1 or primary centre for services and have a failover scenario at another centre.
We also have primary services running on that second data centre that go to the first data centre. So it’s not like having a single primary data centre and a secondary centre that does nothing but DR. Each data centre is a mixture of primary and secondary failover services,”” says Young.
It’s also critical to document the plan properly. “”It’s difficult enough to implement a plan during a crisis; it’s very difficult to develop one,”” says Parker. “”The instructions have to provide enough information so that someone else can bring the system up and implement the technology and so forth.””
An assessment team is also critical and making sure the right people are on that team. That team should include not only IT professionals, but leaders from each business unit.
For smaller businesses it’s important to think creatively when the budget does not allow for an elaborate DR plan.
“”I know business owners in the technology industry that have shared joint efforts for DR and made arrangements to failover certain services to one another’s locations,”” says Young.
When it comes to cost, if your business relies on electronic transactions there may be more of a requirement to ensure constant up-time.
“”When your company is involved in customer-facing activities you are going to spend more on DR. If it is largely e-commerce-driven over the Internet it’s important to recover as quickly as possible.
“”We’re still classified as a small business — a $20 million a year business — and we’re a technology infrastructure business so the brunt of our operational costs is primary and failover services,”” says Young.
Another common weakness most organizations have is they create a DR plan and then three or four years go by and it becomes out of date and doesn’t scale with the growth of the business. For Afilias, DR is an ongoing operational procedure. On a weekly basis at manager meetings the company reviews the status of all environments and systems including DR systems, their availability and that they are full enabled.
“”If an organization has the budget it is certainly well-spent money to have a consultant come in and look the plan over,”” suggests Young.
He advises doing a cost benefit analysis. In some smaller businesses hard decisions have to be made.
“”You want to have every core component of your business identified. You want to make sure you have a failover point for every component. With Q9 we have services we run out of the states and Q9 represents a fallback point for us and vice versa.””