Dodging data-centre disasters
Published: 17 Jul 2007 12:34 BST
If you have a data centre, by definition it's critical to your business. If you've done your job properly, the sort of small-scale mishaps all companies encounter from time to time shouldn't result in any real disruption to your business. Fault-tolerant, high-availability servers, redundant storage, duplicated communications lines into the data centre and backup power systems can all ensure continuity of service.
Accidental data loss through human error can also be planned for, and the data restored in minutes once the problem becomes known. Malicious actions by disgruntled employees are harder to deal with quickly, particularly if they have extensive administrative rights, but it's possible to contain the damage, and company termination procedures should be designed to ensure that, if someone is being shown the door, their access to any systems is suspended as part of the process.
Occasionally, however, a rare but truly disruptive event will come along that threatens normal resiliency plans — in other words, a disaster. A disaster is sometimes defined as an event that can be predicted, but not prevented — at least as far as an individual organisation is concerned. Once you've predicted that a disaster can happen, you need to get some sort of idea how likely it is that it will occur. Losing an entire data centre — functionally, if not physically — is a very rare event: we have few earthquakes in the UK, and they're rarely strong enough to cause disruption to utilities, let alone physical damage to buildings. Terrorism, while high on the news agenda at the moment, is still incredibly unlikely to affect you, especially if you're not located in central London. Research last year by Gartner showed that few companies are interested in high-level disaster planning of the kind that's needed to cope with this level of event.
Timothy Coats, business continuity practice lead for EMC Infrastructure Consulting, says that this kind of complete disaster is highly unlikely: "Generally, the loss of an entire data centre is a rare event. In many instances, a data centre will cease functioning. That's a more common occurrence."
Planning for the worst-case scenario
Disaster-recovery planning for a data centre cannot take place in isolation: it has to be part of an overall business continuity plan for the whole company. Coats believes the rest of the business needs to be involved in the decision-making process: "The responsibility of the chief information officer is to make known to the stakeholders the risk. The business should know where its vulnerabilities are. Unless you know, you're rolling the dice and closing your eyes."
The business should know where its vulnerabilities are. Unless you know, you're rolling the dice and closing your eyes
Timothy Coats, EMC Infrastructure Consulting
If an event serious enough to take an entire data centre offline occurs, the chances are the business has been affected in other ways as well. There's no point in restoring a business-support function, like a data centre, if there's no business left to support, particularly if your data centre is co-sited with your core operations: a manufacturing company's ability to produce products may well be disrupted or destroyed in a disaster; service companies could have no viable office space left for people to work in and no way of acquiring any in a reasonable period of time. Worst of all, a true disaster may involve loss of life. Including this thinking in any disaster-recovery plan may be horrific, but it's necessary.
Sometimes the disaster may not even affect an organisation's own operations. "You need to plan for the recovery of the loss of other business services, such as a major supplier," says Coats. While IT systems have their part to play in enabling a quick switch of suppliers, this kind of event falls outside the scope of a data-centre disaster-recovery plan.
Decisions about disaster avoidance and recovery strategy are based on economics then, not what's technically possible. Of course, choosing the right technology to help the recovery from any problems is vital but, since the options range from simple data-recovery tools to a parallel-computing facility, deciding what's reasonable and prudent for the organisation to invest in is crucial.
The more a business relies on its data centre, the easier decisions become. For a company in the finance sector, downtime means losing millions of pounds an hour. In that case...






