Dodging data-centre disasters
Published: 17 Jul 2007 12:34 BST
...spending a few million on a shadow data centre will pay for itself in just one short power outage. Services being unavailable for the amount of time it would take to rebuild the data centre is unthinkable for this industry.
The number of companies that can afford this type of backup facility is limited, though. For the rest of us, the cost isn't justified by the risk faced: a second data centre will never pay for itself. Coats says EMC's non-finance clients usually don't adopt this approach. "About 50 to 60 percent [of our clients] are in the financial sector. Of those not in the financial sector, about 20 percent have a second data centre," he explains.
There are alternative, less costly strategies that mean a longer downtime. These may not be necessary for every single system within an organisation. It may make sense to have a short recovery time for "only one or two critical systems that need shorter downtime", according to Coats. Gartner claims around 10 percent of applications are considered critical at present, although this figure is expected to rise to 25 percent by 2010, as businesses become more dependent on their information systems.
Restoring these crucial business services can be achieved far more easily than trying to recreate the whole data centre at once. Having just enough spare hardware available — in secure storage, for example — becomes more cost-effective. For less critical services, a company may be able to tolerate downtime long enough for new machines to be delivered. Checking the typical lead time for hardware vendors and investigating any fast-delivery services they may have will pay off if this is the case. Using virtualisation will also make restoring individual applications easier, as virtual machines can be deployed to whatever hardware is available in the shortest time.
Doing nothing is sometimes an option. For some companies, after making the risk calculations and figuring out the cost of disaster mitigation, it simply may not be worth spending the money for what it will achieve. The board's responsibility to the shareholders is to ensure the maximum return on their investment, not necessarily to keep the business going at all costs. If this means winding up the company and selling off any remaining assets after a disaster, then that's what they need to do but, as with any decision of that nature, it's one the board has to make.
Even if this is the company's plan, a certain amount of work still needs to be done to ensure that the receivers of the company have enough information to do their job properly: financial records need to survive the disaster, even if the business doesn't.
Rehearsals are key to successful recovery
Whatever your company's size and disaster-recovery strategy, you must plan and budget for one extra factor: rehearsals. Regular testing of your plan will not only ensure that everyone involved knows how the strategy will work in practice, but will also allow you to make changes in the plan to reflect changes in the business environment since it was first written.
IBM's Redbook on disaster-recovery planning recommends testing your procedures at least once a year, over and above regular testing of backups and spare hardware. It also points out that you'll never get a truly realistic test of your strategy, since, in the real event, some of your staff may not be available and those that are available are likely to be distracted and under greater stress than during a test.
The Redbook also recommends changing staff roles during tests — making your database administrator deal with network configuration, for example — to reflect what may happen in a real disaster, but also to give an idea of how much of your plan is truly documented and how much is held only in the heads of your staff. Document everything that goes wrong during your test, then adjust the plan accordingly.
Disaster-recovery strategy is as much about finance and business relationships as it is about technology. Not having a strategy is certainly negligent, but spending too much time and money on trying to plan for highly unlikely events doesn't do much for shareholder value either. A good data centre-recovery plan doesn't treat the data centre as an isolated, monolithic lump, but takes into account what services it provides to the rest of the business, and how the users of those services are likely to be affected both by downtime and the cause of the problem. Combining a sensible assessment of the likelihood of a disaster with a realistic set of targets should ensure your disaster-recovery plan doesn't become a burden and stays in proportion to the risks your company faces.






