After the first-timers lunch (a great opportunity to chat to some fellow newbies!) I went to a session on access to UK government information. This is a topic I’m quite passionate about, so I was interested to hear about the current initiatives to make public information more accessible. There were two speakers in this session: Jennie Grimshaw from the British Library, talking about the Magic (Maintaining Access to Government Information Collaboratively) project; followed by Edward Wood from the House of Commons talking about digitising Hansard.
Jennie started her section by explaining that she’d hoped to be able to tell us about the BL’s implementation of the Magic project, but that as their funding from HEFCE had not yet been secured she could only talk us through the preliminary research. This was certainly interesting stuff in itself though! Talking us through the various findings of the project, Jennie explained the problems in maintaining access to government information – moving from print to web leads to preservation issues: government websites are frequently updated, reorganised and even removed without any consideration given to what happens to the information contained, so a lot of it is just lost. She also talked about the types of materials that researchers use: mainly grey literature, such as research reports and summaries, statistics and expert reviews. As this is the type of material that is usually only found on government websites, not commercial sites, the issues of lack of preservation obviously cause great problems for researchers. It was discovered that 78% of researchers primarily used government websites to find their information, but complained about the poor design, structure and search facilities of these sites making research very difficult. Many people also rely on general search engines to tell them what information is available, but the lack of quality metadata and poor site design of most government websites means that search engines don’t index them very well, so a lot of important information is effectively invisible to the average Internet user.
Jennie went on to describe some of the initiatives already underway to combat some of these problems: such as the National Archives project to comprehensively archive all government websites, which has been underway since November 2008; and the BL Digital Donations scheme, which is in place until the implementation of Legal Deposit legislation which will cover e-materials. Jennie talked a little about gathering information under this scheme, which relies on the co-operation of various government departments and agencies – I sensed a little bitterness in her voice at that point! She explained that the current process – gathering content via RSS feeds where available, and going into sites manually where not – is obviously unsustainable, and that the BL were looking for a push-technology solution.
To round up, Jennie pointed out that the research had probably raised more questions than answers. Some issues raised:
- Solving the permissions dilemma
- Archive documents or snapshots of full websites?
- Access from library catalogues or repositories?
- Desperate need for cross-archive federated search: currently separate archives for England, Scotland and Wales
- To avoid duplication or support LOCKSS? Present policy is that it is not cost effective to duplicate content/effort – but can we guarantee continuing access if there is only one copy?
Edward Wood then took over to talk about the hows and whys of the Hansard digitisation project. He described the benefits of digitisation: increases access/usability, frees up physical storage space, enables preservation without the costs involved with restoring paper copies. He also gave an overview of some of the technical processes involved: data capture was automated but with some manual intervention to ensure quality control, and OCR was used to ensure that data was provided, not just documents (cf Richard Wallis’ thoughts on the semantic web!). The development of the web interface was discussed: it was designed to have an experimental, web 2.0 feel about it. The interface was developed in close consultation with users – it is open source, and there is a public development issues log, to allow users to report directly on how they would like to be able to use it and any problems they face.
Edward described the project’s aims of creating something that was fully “Google-able”: most people won’t realise that what they are looking for is in Hansard, so they won’t come directly to the site. The information is broken down into chunks to aid searching, and the team have also created a Wikipedia template for Hansard content.
I was really impressed by the enthusiasm that Edward obviously had for making this information (and data) publicly accessible. There was a lot of emphasis on allowing people to take the data and do what they like with it – for example, the Wikipedia template – which is an attitude I don’t tend to associate with government information. This seems like a really great step forward; I hope it indicates a trend in this direction that other providers of official information may follow.