Defending against the digital dark age
Published: 19 Jul 2007 13:25 BST
...argue that open source offers a fundamentally more future-proof data format because of the sheer number of organisations involved and the information-sharing that results.
"We are not, ourselves, involved in that area. Clearly there is a debate about which open document format people choose, and frankly we are agnostic about that. We will take what people give us but we are not wedded to one format," he says. "We take the files that people send us and it's not for us to get involved in debates about where the best way forward is. In a way, the market will decide, whatever the rights and wrongs are. It's like VHS and Betamax — whatever the best technology is, the market will decide. I don't think us expressing strong views either way is relevant."
But the controversy around translating documents to open file formats doesn't end there. Outside the open source versus proprietary debate, there are arguments within the archive community about whether documents should be translated at all. Some archiving purists claim translating documents is a crude approach to preservation and can be likened to translating a poem into a different language, and then destroying the original. Instead, computer scientists such as the Rand Corporation's Jeff Rothenberg claim the emulation approach, such as the NA's Virtual PC 2007 project, is a much more sensitive strategy.
"Not only does each translation lose information, but translation makes it impossible to determine whether information has been lost, because the original is discarded," wrote Rothenberg, in a 1995 Scientific American article entitled Ensuring the longevity of digital information.
Read this
HP and MIT team up on open-source archiving
Not-for-profit organisation is set up to support users of DSpace, a community-developed online digital-preservation tool
However, Thomas argues that for the huge volumes of information the NA has to deal with, translation is the only practical approach. "Generally in the digital-preservation world, I think it is accepted that for bulk operations, migration is the most practical, the most cheap and the most robust approach, and also, crucially, it means you can read the stuff at home," he says. "If you want to read an old WordStar document, we can migrate it to the latest version of Word or whatever and you can read it on your browser at home."
As well as electronic and digitised paper documents, an increasing part of the work done by the NA is around storing video which, according to Thomas, has its own problems. "We get odd video formats and they have to be a bit more hand-crafted. There are a few oddities where we have to figure out what to do with them," he says. "It is still a very tiny percentage of the documents we have to deal with, but we have some very tricky problems. Public inquiries into the loss of ships involve building these 3D virtual-reality models of how the ship sank, and they pose quite a problem for us."
The challenge of changing formats
The issue of how to deal with awkward formats such as video is related to a bigger challenge being faced by the NA. As the organisation only really gets its hands on some electronic documents after 30 years, there is a long period where important documents are out of its control, explains Thomas.
"The big issue that is facing government records is not how long they will survive in the National Archive, because that is pretty well-managed, but how long they will survive in government departments before they even come to the archive," he says. "The issues government departments have is that they generate a vast bulk of electronic records and after two years or so you don't have to consult them on a day-to-day basis, but there are things that you may need to consult some day down the road, and if you don't have some way of preserving them you won't be able to read them."
The police service is a good example of how crucial data-preservation is to some areas of government. If there is an unsolved crime, the police have to...












