It's the end of your data as you know it
Published: 23 Apr 2007 15:49 BST
...that underlies any long-term storage project. The British Library, for instance, found that storage-industry concepts such as ILM were quite unsuited to the type of archive it is establishing with DOM. ILM establishes practices for migrating data from fast, high-performance storage to lower-performance media as the value and use of the information decreases, but "this view of storage is at odds with our own view", wrote the British Library's Richard Masters in a white paper. That's because the Library doesn't judge the value of its objects, and doesn't intend ever to delete them.
British Library
The Library has gone through several attempts at building a long-term digital archive since the late 1990s, including calling in IBM to build a complete system from its specifications — the approach used by the KB, although on a smaller scale. None of the projects came to fruition.
"Then we realised as an organisation that the big-bang approach was never going to work. Nobody knows what the requirements are," says Masters. "That's why we are building DOM in a component fashion and learning as we go. That way we don't have a huge risk — we aren't building an expensive application that doesn't meet our needs."
If there's one thing that's certain, it's that digital records will keep increasing. They aren't going away
Richard Masters, programme manager, British Library's Digital Object Management scheme
The library's key requirements are for its digital objects to be available forever, but at a very low, though undetermined, rate of access. That means the system has to be durable, flexible and affordable to maintain, but doesn't have to offer the high speed required by enterprise storage systems.
The initial system is built in two redundant sites, each growing to about 300 terabytes, using commodity magnetic disk drives on the relatively new Serial ATA standard. That means the hardware system is independent of any one vendor; the library plans to simply replace drives with newer ones as they reach their end of warranty. The initial tender went to VSPL, which proposed a solution using JetStor disk arrays. The software layer is designed to be independent of the technical properties of the physical storage itself.
Aside from the two main sites, there will also a third "dark archive", designed as a way back from total failure of the two main sites. The details of this are still being worked out, but the idea is for it to be in a completely separate repository using a totally different technology.
The British Library's choices in some key areas underscore the degree to which the field is divided over best practices. For instance, the library has decided that the migration approach — translating from old formats to new formats — is most appropriate for its archive. "Emulation versus migration is one of those religious wars in the archives community," Masters concedes.
Work is also being done around turning Microsoft's Office Open XML file formats into open standards, bringing it into conflict with supporters of ODF and those who believe Office Open XML will extend Microsoft's control over the creation of documents. "There are billions of Word documents out there, and, if those were opened up, it would be a huge resource," says Masters.
He argues that the only thing organisations really know about digital preservation at the moment is how little they know. "It's a learning curve. We've put together the best thing we can for now, and we'll run with it for a time and accept we're going to make changes," he says. "Openness is important on this — we've got to learn from our experiences and share that with others. Experience is the only thing that will get us moving forward on this."
While few companies currently have to deal with the issues the library is tackling now, they are likely to have to do so at some point in the future, adds Masters.
"This will become mainstream. The technologies we are developing may end up being built into some storage products as standard. A lot of tools will be made available through the work that's going on now," he says. "If there's one thing that's certain, it's that digital records will keep increasing. They aren't going away."









