Defending against the digital dark age
Published: 19 Jul 2007 13:25 BST
...keep the records for 75 years, and if it's a serious crime where someone is convicted, they have to keep the records for the life of that person. "Back in the days when that was a few paper files and a bit of DNA, that was OK. Now the police have discovered video cameras, and they have video cameras by the side of the road, in patrol cars, and even police dogs have their own video cameras now — and that is a huge preservation problem," says Thomas.
To answer this problem of storing intermediate documents, the NA has been allocated some money from government to set up a shared service for digital preservation across government, not run by the NA but contracted out. What the business model will be and how the system will work exactly is still unclear, but Thomas claims departments will have to be judicious about what they choose to archive.
"The vast bulk [of documents] will not survive because we are not interested in video films of the M1, although they might prove to be the most interesting in the long term — police car chases and dogs etc might be the most interesting," he quips.
As well as the pressure to keep up with video and the huge swathes of electronic documents, the NA is also charged with archiving government websites, a massive challenge on its own given the explosion in online public services in the last few years.
To take the burden off Thomas's relatively small IS team of around 50 staff, the NA has a contract with a organisation called European Archive to capture about 65 government websites. Some are sampled every six months, others every month, and a snapshot is always taken of any government website that is due to be shut down.
A daunting task
But even with outside help, Thomas concedes that keeping up with the sheer volume of information being added to government sites is daunting.
"There are lots of problems with websites and we are working very hard to deal with them. One problem, and it's a conceptual one, is that years ago you had government departments and they created paper records and we would select some of those and they would come here, but now you have government departments and outside of that you have things that are government funded — things like Theyworkforyou.com — the website about how MPs behave. The space in which we operate has expanded — it's not as clear cut as it used to be in the old days," he claims.
What is scary is that very little has survived since before 1996
The flexible and interactive nature of the web makes sites easy to update, but that has repercussions when it comes to data preservation, says Thomas. "Things get lost on websites, URLs change, and people delete things and move things in a casual and random fashion. It's not a UK government problem, it's a worldwide problem that things get deleted," he says.
To give it a fighting chance, the NA works with an organisation called the European Archive which is a branch of the Internet Archive founded by MIT graduate Brewster Khale as part of his plan to capture all human knowledge. "They started in 1996 and sort of expanded since then. What is scary is that very little has survived since before 1996. When it was the tenth anniversary of our website I thought we would do a little exhibition, but we have lost the first three or four years of our website," says Thomas.
Aside from simply capturing websites, the next big thing in web archiving is being able to search them, says Thomas. "At the moment you have to know the URL and what year you want, but what we are going to do next year some time is use some kind of search tool to search this archive of government websites — maybe we will use Autonomy, or maybe we will use Google but what we have to do is begin to search them," he says.
Top tips for data preservation
What advice would Thomas give to other organisations facing up to their own data preservation challenges? He offers three steps are a good starting point.
"Firstly, metadata is crucial — you have to have metadata that identifies the documents that you want to keep. Selection is also very important. You need to put your preservation resources into a small number of things. You should only try and keep what you want to keep," he says. Finally he claims that a live approach to preserving documents has to underpin the entire strategy. "It is much better to keep things on live systems on servers that are backed up, than it is to put things on CDs and put CDs in drawers."
The National Archive is well funded — though maybe not to the same degree as its US equivalent, The National Archives in Washington — but Thomas seems confident about the challenges ahead. The next big step for the organisation is to fully embrace the web, and be able to deliver a 99.9999 service level to the information-curious public perusing its site. This is a massive undertaking and will include setting up a mirrored hosting centre off-site next year to ensure continuity of its web operations.
Archiving the nation's most important documents has occupied most of Thomas's career to date, but it's only in the past three to four years that digital documents have begun to take the majority of his time. And given the predictions for the growth in digital media over the next decade, he isn't likely to be out of work any time soon.












