It's the end of your data as you know it
Published: 23 Apr 2007 15:49 BST
...translation can be difficult and expensive, and may deliver something substantially different from the original — as with old static databases that had to be redesigned to fit the relational database model.
In effect, preserving digital objects by migrating them to current formats isn't really preserving them at all, Rothenberg argues — it could be compared to translating a poem into a different language, and then destroying the original.
"Translation is attractive because it avoids the need to retain knowledge of the text's original language, yet few scholars would praise their ancestors for taking this approach," Rothenberg wrote. "Not only does each translation lose information, but translation makes it impossible to determine whether information has been lost, because the original is discarded."
An alternative is emulation, in which the hardware, operating system and applications needed to view an original application are all simulated using current technology, an approach Rothenberg favours and which was pioneered in practice by fans of obsolete video games in the 1990s. This has its own complications, but at least it is a way of keeping documents accessible in their original state.
Physical degradation
Besides the issues around obsolescence of file formats, applications, operating systems and hardware, there is the more basic question of how to deal with the fact that media physically degrade or become obsolete.
Not only does each translation lose information, but translation makes it impossible to determine whether information has been lost
Jeff Rothenberg, RAND Corporation computer scientist
How long will various media types last? There's considerable controversy around the issue, with Kodak claiming in one report that its writeable CDs would last 217 years under certain conditions, while others observe that such media start to degrade after only a couple of years. Rothenberg estimates that optical media have a practical physical lifetime of five to 59 years, digital tape two to 30 years and magnetic disk five to 10 years.
There's just one problem with such estimates, though — they're all academic, because, with the fast pace of change in the IT industry, any given medium will be obsolete in about five years. Even if it continues to function, modern hardware may not be able to read its contents or even connect to it.
"Digital information lasts forever — or five years, whichever comes first," Rothenberg quipped.
That means any organisation that wants to keep its data accessible will have to look forward to an unbroken chain of migrations within a time cycle short enough to prevent the media from becoming physically unreadable or obsolete before they are copied. "A single break in this chain can render digital information inaccessible — short of heroic effort," Rothenberg wrote.
Taskforces
Things look quite different from the point of view of the archivists who deal with questions of preservation on a practical level. The daunting prospect of future paradigm shifts, for instance, is nothing new — archivists and records managers are trained with the understanding that future generations may well disagree with their choices about what to keep and what not to keep, and how objects are preserved. "As a records manager, you have to accept that whatever you do will be wrong," says Anna Riggs, an archivist with Birmingham City Council.
A number of institutions are now putting long-term digital preservation programmes into place, including the British Library, the Library of Congress, the National Library of the Netherlands (the KB) and the California Digital Library, among others.
Other organisations are working on infrastructure and standards designed to back up such programmes. The EU-funded Planets (Preservation and Long-term Access through Networked Services) project, for instance, is co-ordinating European national libraries and archives, research institutions and IT companies to address digital preservation issues. The Digital Preservation Coalition is doing similar work at a UK level. Meanwhile the Storage Networking Industry Association (SNIA) has established the 100 Year Archive Task Force, which is aiming to come up with best practices for long-term data retention.
The SNIA is also working with the storage industry on Extensible Access Method (XAM), which is expected to produce interfaces between applications and storage systems that co-ordinate metadata to stabilise interoperability, storage transparency and automation for what's known as information lifecycle management (ILM), sometimes called data lifecycle management.
This all sounds very organised, but it masks the absolute uncertainty...









