Advertisement
Promo

Become a member of the ZDNet UK community

Comment Articles

Second source, not open source, is the key

Rupert Goodwins ZDNet.co.uk

Published: 16 Jun 2004 14:50 BST

  • Email
  • Trackback
  • Clip Link
  • Print friendly
  • Post Comment

As Steve Jobs took to the stage yesterday to launch iTunes Europe, the whole concept temporarily failed. The company website, the machine that turns music into money, was pushed off the Internet by a denial-of-service attack. It wasn't alone: Microsoft, Google and Yahoo were all hit. Exactly what happened is still foggy, but it looks as if Akamai -- the company that handles the DNS translation of those Web site names to their numeric Internet addresses -- was the focus of attention.

At the other end of the Web content spectrum, thousands of bloggers were also left staring at blank screens. Dave Winer, the chap behind the free hosting service weblog.com, had unilaterally decided that the service was costing him too much time and money and had thrown three thousand blogs into the void without warning. Short of flying the hammer and sickle at a Republican Party convention, it's hard to know how to create more antagonism in such short order.

Again and again, the seismic events small and large that shake the online world have one thing in common -- the existence of single points of failure. With viruses and worms, the factor is usually Microsoft: not that its code is necessarily worse than any other, but that its ubiquity amplifies a single small fault into global vulnerability. With the Internet infrastructure, latent problems in Cisco routers or the reliance on one company running top-level domains builds weak points into a system that is built on a superbly fault-tolerant protocol.

It's wrong to think of this as primarily a technical issue. Air-accident investigators know that pilot error or a mechanical malfunction is almost never the root cause of a crash: instead, a chain of events leads to the final denouement. At any point along that chain, the disaster could have been averted: the weather is bad, so a rapid change in altitude is needed. The pilot is tired and distracted by nearby thunderstorms, so goes past the new level. Air traffic control is overworked, so the mistake goes unnoticed. Another aircraft on an adjacent level has a faulty collision warning system. Result: calamity. Change any one of those factors and the story is dramatically different.

The onus on anyone with responsibility for producing a technical system of importance, whether it's a public Web service or an internal corporate IT system, is to understand these chains of events and to engineer not just the technologies but the skein of factors that surround them. At one level, this is obvious and expected: you wouldn't run a server farm without having a fail-over system in place, or have your comms room reliant on mains electricity without a UPS backup. But as you go up layers of abstraction, the importance of diversity becomes discounted. This isn't because it's any less vital to have alternative strategies in place, but because we have learned to think that it's too difficult.

Next

Previous

1 2


  • Email
  • Trackback
  • Clip Link
  • Print friendlyPrint with EPSON

Did you find this article useful?
16 out of 36 people found this useful


Full Talkback thread

0 comments

Company/Topic Alerts

Create a new alert from the list below:







Skip Sub Navigation Links to CNET Brand Links

Help

Become part of the ZDNet community.

Newsletters