Martin Belam, July 2007
BBC Archives: Beyond the programme download 1
Beyond the programme download 4
Using archive clips instead of programmes - Case studies 6
Examples of potential granular BBC archive re-use 8
Recipe Finder plus 8
Blue Peter makes plus 8
Seasonal topic aggregation 9
Supporting the nation's health 9
Individual School History 10
Local Authority Use 10
Travel agents and destination guides 11
Adding value to review sites 11
Comic Relief / Children In Need video-on-demand fundraising 11
BBC TV Trivial Pursuit 12
Assisting people to take the Life in the UK Test 12
Lifestyle Makeovers 13
The Day / Month / Year You Were Born 13
National Archive Government paper packages 14
Make archive programming available for signing 14
Ultra-Local Archive 14
Artist Career Archive 16
Open Subtitling Project 16
Media and Social Studies Resources 16
The BBC's Written Archives 19
Archiving the BBC's Websites 23
Guidelines for future website archiving 27
Digital storage strategy 29
Facilitating retrieval from a fully digital archive 30
Speech to text conversion 31
Shot boundary detection 32
Using clustered search retrieval 32
Expanding clustered results beyond television programmes 34
The majority of BBC strategies for dealing with the immense archive of British cultural heritage that the Corporation possesses revolve around the distribution and exploitation of complete and discrete programme entities.
However, there is a vast wealth of information archived by the BBC that does not take the form of programmes, for example the Corporation's web assets and written archive. Additionally, programme material can be broken down into a much more granular level for reuse, distribution and, where appropriate, re-sale.
This paper outlines potential ways that the BBC could exploit the cultural and commercial value of the archive material. This entails looking beyond merely moving BBC Worldwide's existing DVD, VHS and Audio CD business model into digital distribution channels.
Breaking programming into component parts would allow the BBC to provide a greater range of uses of the material for academics, schools, small and large businesses, and community groups, and the British public as a whole.
This is a once in a generation opportunity to unlock the real cultural worth the archive represents to the UK as a nation.
Beyond the programme download
It is normal within the BBC to contemplate the vast array of complete programming held in the Corporation's vast archive as a tangible asset. In the current context of a lower than expected Licence Fee settlement, and the increased ease of digital distribution, it is increasingly easy to see it as a saleable one.
However, it is important for the BBC to realise that the value of the archive is much more than simply a historical collection of programmes.
For a start, the BBC possesses archives beyond television programming, including written and photographic stills archives, audio archives and digital archives of external web and internal intranet assets.
Typically, strategies to digitise this archive and make it fit for the 21st century have focussed on the exploitation of complete programme material. However programme material can be broken into far more granular pieces, allowing more flexible and more widespread use of BBC material. This is closer in spirit to the original aspirations of Director-General Greg Dyke when he announced the Creative Archive initiative in 2003.1
The existing model for archive exploitation mimics the DVD, VHS and Audio CD and Cassette sales of complete programmes that has been the business model since BBC Enterprises in the 1980s, but simply moves it into the digital domain. The proposals within this paper suggest other ways to exploit the granular value of the archive.
Breaking out of the constraint of whole programme downloads would allow the BBC to offer from the archive a variety of formats like clips, transcripts, audio soundtracks, still images, documents and web assets. The audience could consume these on a much wider range of devices than those that can handle full programme downloads.
The clip format is more suited to portable devices like the iPod and smart phones. The lower bandwidth requirements also make the clip format more suitable for the Licence Fee payers yet to have access to, or to take-up, broadband internet access.
Increased distribution channels, and an increased array of products developed from the BBC's archive, should increase audience reach for the content of the archive.
Using archive clips instead of programmes - Case studies
If it is the case that any sufficiently advanced technology is indistinguishable from magic2, then sufficiently ancient technology looks like background radiation - it is completely taken for granted.
If you are 35 now, then growing up in the UK you would have seen the rise of home video recording, the introduction of the CD and Channel 4, the arrival of brick-sized mobile phones and now the internet. Colour television, live satellite broadcasting, man on the moon and winning World War II are all established facts.
If you are fifteen now, then not only are all those things listed above simply 'a given', but multi-channel TV, omnipresent mobile phones, internet based social networking, and downloading bite-sized clips of programmes from YouTube are equally just facts of life.
And for the generation below, those now aged 0-5, whatever technological advances or changes in the media are made over the next 10 years will be the cultural background radiation of their lives - their established norm.
The rise of 'clip culture' doesn't mean that the long-form of the movie is dead, or that people won't watch any full length programmes. In the same way, radio news didn't destroy the newspaper industry, and the advent of television didn't mean the end of radio.
However, it does mean that a large proportion of the BBC's young and future audiences are much more used to discovering, sampling and consuming their content in bite-sized chunks on demand, rather than in the linear television schedule where the shortest available slot is usually at least 5 minutes long.
Liberating clips from the archive where they live in full length BBC programmes would greatly increase their desirability and the potential for their use and re-use.
People are unlikely to want to sit through an entire half-hour edition of seventies network staples like Nationwide or That's Life!, but would be still be entertained by watching a short clip of a duck on a skateboard3 or a dog apparently saying sausages - as the latter's presence on YouTube demonstrates4.
More importantly, breaking these type of long running shows into their component clips means that although they may not be watched in their entirety, users could easily discover the content that means something to them.
Perhaps the time their hometown was on Nationwide, or when their local 4th Division team was featured on Match Of The Day playing Arsenal in the F.A. Cup Third Round.
The archive should also facilitate the retrieval of the things that mean personal things to people, like the time their grandmother was in the congregation of Songs Of Praise or their grandfather appeared on One Man And His Dog, or that they were in the audience for Juke Box Jury or Question Time.
Breaking the archive into a series of discoverable clips and assets beyond the offering of just full programmes opens up a whole load of potential new products and services for the BBC to power and deliver. These could work on a variety of business models: free, commercially exploitable, or delivered by third parties using BBC material with BBC permission.
In the following section are just some examples of the kinds of service that could be created, enhancing the public value that the BBC's archive gives back to the nation that has paid for its creation.
Examples of potential granular BBC archive re-use
Recipe Finder plus
Currently the BBC offers a recipe finder5 which allows users to input up to three ingredients, and then produce a set of text results listing recipes.
If the archive is broken into clips, it should be possible to make a greatly enhanced version of this service, whereby the user can enter a few ingredients, and then watch clips of food being prepared by BBC chefs from throughout the history of the BBC's food programming.
This greatly widens the appeal and usefulness of the service, which could be used both at home and in classrooms. The clips could also be made available for download onto portable video devices like an iPod or smart phones, to make it easier to access the clips whilst in the kitchen.
Blue Peter makes plus
The BBC's Blue Peter site currently offers "Things To Do" for their young viewers online6, and the defunct Cult Classic TV site offered a variety of 'makes' from the series history, alongside the fact sheets from the era7.
Using the same principle as the Recipe Finder Plus, with clips available it, should be possible for the user to enter some objects they have to hand (glitter, egg boxes, the ubiquitous sticky-backed plastic etc) and get a selection of clips of Konnie & co making things to keep children busy during the school holidays from down the ages.
Seasonal topic aggregation
At the key seasonal points of the year, Christmas, Easter, Valentines Day, the BBC produces a great deal of programming and web-based material devoted to the topic. However these very often form small segments within programmes (e.g. lighting the Advent candles within Blue Peter, a Christmas feature within Good Morning with Anne and Nick, or a choir singing a specific carol on Songs of Praise)
At present there is no way at all for the BBC to aggregate this kind of content in their current web offering or programming. However, a segmented archive with the correct metadata would allow the viewer to peruse and the BBC to specifically promote for the first time decades of specially made Christmas or Easter content in one place
Supporting the nation's health
The BBC has hours of serious footage about medical issues, from scientists discussing new treatments on programme strands like Horizon or Tomorrow's World, to following everyday stories of hospitals in the UK on programmes like City Hospital.
With these programmes divided into sections that focus on specific treatments, medical procedures and diseases, it would be possible for the BBC to provide supporting material to services like NHS Direct, or to local authority health trusts.
Enhancing the online offerings of these kinds of public services could aid the public in understanding what will happen to them in hospital, what their treatment options are, and encourage them that people recover from the illnesses they are suffering from.
Individual School History
The BBC archives contain a wealth of educational material broadcast over the years - but they also contain a great deal of material that illustrates the history of educational establishments in Britain over the last few decades.
Schools ought to be able to retrieve from the arcade when their pupils appeared representing the school on "We Are The Champions", "Cheggers Plays Pop", or when pupils appeared on "Screen Test" or had their letters read out on "Junior Points Of View"
Local Authority Use
Material from the BBC archives has a potential role to play in enhancing civic life in the UK.
Local Authorities in the UK spend a great deal of money on building websites producing publicity material highlighting local services, health and safety issues, community issues.
Many of these issues will have corresponding BBC television coverage, whether it is about recycling, the different types of local schools available to residents, dealing with problem neighbours, the local judicial process and so on.
If this content was available for re-use by local authorities within their websites, it would enhance them as a multimedia proposition to engage the local community, without massively increasing costs for the authority involved.
Additionally, the BBC has hours of footage putting into context local politics and the local election process, that could be made available to local authorities for re-use in their attempts to engage the public with the local democratic process.
From Michael Palin's globe-trotting adventures, to the early days of the BBC 'Holiday' show, the BBC's archive contains hours and hours of material about travel destinations. Radio 4's "A World In Your Ear" joins together thematic selections of radio material from around the globe.
As whole programmes, they may not often be listened to again or viewed again in their entirety – with the exception, perhaps, of Michael Palin - sections from his shows have, for example, already appeared on the television screens on London Buses.
However, if this material were segmented from the original programme into discrete sections for re-use about specific destinations, countries or cultures, it would be of great value to the travel industry.
This is all material that could be used by non-profit sites like WikiTravel, by tourist information boards around the globe, or by travel agencies based in the UK.
Adding value to review sites
Top Gear may be mostly tongue-in-cheek and about performing stunts these days, however the ability to flick through back issues of the programme organised by make and model of car would be a valuable asset to car enthusiasts, dealers and manufacturers.
Provided, of course, the programme showed the model in a good light and that Jeremy Clarkson didn't smear the sexuality of the car in question.8
Top Gear is just one example where BBC television and radio reviews could be used to compliment online content about a particular product by either the manufacturers or producers themselves, or by comparison sites.
Elsewhere, for example, the UK edition of IMDB could include clips of Barry Norman or Jonathon Ross reviewing films or interviewing the stars of films.
Similarly bands could be free to incorporate when they were interviewed on Radio 6Music or Radio 1 into their official websites or MySpace sites.
Comic Relief / Children In Need video-on-demand fundraising
Since the BBC first made an appeal to fund-raise for children in 1927, whether it has been newsreaders dancing, or specially filmed comic skits of well-known programmes, the BBC has helped the British public raise millions of pounds for good causes. A great deal of production effort goes in to the fund-raising telethons broadcast on the BBC.
However, with the exception of the odd special format being released on VHS or DVD (Comic Relief's "Doctor Who and The Curse of Fatal Death", or The Little Britain special, for example) very little of this content is ever shown again or exploited a second-time for fund-raising purposes.
Breaking down these marathons into their component clips would allow them to be used in a fund-raising video-on-demand service supporting both charities, obtaining donations from users in return for downloads or streaming video clips
BBC TV Trivial Pursuit
The BBC's archive includes hours of quiz format footage, whether it is The Weakest Link, Ask The Family, Screen Test, A Question Of Sport, University Challenge or Mastermind.
With access to clips of the questions and the corresponding answers, it would be possible to make an online or DVD quiz game that gathered together the disparate topics and questioning styles of those programmes into one complete BBC quiz experience.
In a more light-hearted vein, shows like "Have I Got News For You?", "QI", "I’m Sorry I Haven’t A Clue", "Never Mind The Buzzcocks", "Blankety Blank" and "Fighting Talk" could also be thrown into the mix. But maybe "Ask The Family with Dick and Dom" can be left out.
Users could narrow the range of questions to their field of interest, and the game would throw a random selection of questions from the shows at them.
The demand for re-watching an episode of Mastermind from the 1970s might be low, but there is still an entertainment and educational value to be derived from the questions contained within it.
Assisting people to take the Life in the UK Test
One of the major components of political discourse in the UK over the last few years has been the integration and non-integration of various sections of the community into British society.
This area has become highly politically charged. Government policy is that immigrants applying for citizenship of the UK should pass a 'Life in the UK' test, taught mainly through a book.9
The BBC, through its archive, is able to supply a large amount of footage that could support this process, by either supplying official course material leading to the qualification, or by making available bundles of clips to voluntary community groups assisting their members in making the transition to British citizenship.
In this regard, the BBC has probably the biggest repository in the world of videos illustrating British culture, social customs, and the diversity of communities within the Britain, and could be providing a service to the nation.
At the most granular level programmes can be broken down into a series of stills. The BBC's lifestyle and makeover programming for homes, clothes and gardens could all be broken into smaller clips focussing on one aspect of a design or makeover.
The finished results could be browsed on the web as still images, grouped by concepts like "Art deco", or colours, or type of garden and so forth. Users could browse the stills of how a makeover ended up, then choose to watch the clip of the transformation they most want to see.
The Day / Month / Year You Were Born
The BBC has already experimented with concepts like "The Time When…", gathering user-generated content around a specific date.
It also has the successful "On This Day" site exploiting the BBC News archive for landmark events organised around the theme of the calendar.
With the archive it should be possible to bundle together news clips and programming from both radio and television that was broadcast for any specific day / month / year - depending on which time measure provide the most practical.
The BBC could distinguish itself from the "Born on this day" DVDs and CDs that already exist on the market by including a greater range of interactive content, and the kind of user-generated content collected by The People's War project.
National Archive Government paper packages
Every year the National Archives release a package of Government papers that reveal the background story of the issues and decisions of the day 30 years ago.
Every year, the media scrutinise this content for one day around Christmas, and make a big splash of it.
And equally, every year, a large section of the general public find it difficult to remember what was so significant about events that seem so distant, or happened before they were born or were conscious of politics and Government, or understand how the events of 30 years ago shape the politics of today.
In the future, using a news and current affairs clip archive with good metadata, the BBC could simultaneously produce a package illustrating the issues within the released papers, utilising contemporary footage, interviews and supporting documentation from the BBC's written archive.
This could be made available not just to the public via the BBC.co.uk website or on demand conduits like the iPlayer, but could also be distributed to the British press, allowing them to add substantial context to their own coverage of the 30 year rule papers.
Make archive programming available for signing
At present, the BBC is one of the leading broadcasters in the world for making available current output in late-night editions that are signed for the deaf.
However, it is only in recent years that this has been the case.
Although archive programmes that gain a VHS or DVD release via BBC Worldwide and subsidiaries are treated with subtitles and sometimes signing, the majority of the content of the BBC archive remains inaccessible to the deaf, even if it was available freely to the public.
The BBC could consider entering into partnerships with charities like the RNID, to commit to signing specific areas of the archive, or to signing within a specified timeframe any programme or programme clip that was requested in the format.
This would demonstrate a commitment to best accessibility practice on the behalf of the BBC, and set a standard for other public service broadcasters around the globe to follow.
For local history groups, local councils or, indeed, anyone with a keen interest in their area, the ability to pull out of the BBC Archive any clips that mention their hometown would be a tremendous facility.
A search for 'Walthamstow' should pull out of the BBC all the clips pertaining to it - whether it is a disparaging jokey references in 1980s Only Fools and Horses, to a documentary about William Morris, to news coverage of a murder at Wood Street Station.
How this could improve the current offering from the BBC is demonstrated by a case study from the team at Tower Bridge.
In 2005 the bridge was preparing an exhibition on the history of the building, and wanted to include appearances of the bridge in media, advertising and so forth.
They were aware that a BBC clip existed showing the bridge being raised as Queen Elizabeth II returned from her post-Coronation tour of the Commonwealth in 1954. They had seen the clip on the BBC's On This Day website10.
However, the BBC archives were not able to locate a version of the clip in a format that the bridge could use.
Firstly, of course, if the BBC website offered direct downloads, or a YouTube-like facility to embed BBC content in third party sites, Tower Bridge could have utilised the digital version from BBC.co.uk
However, the bridge would have been interested in any clip the BBC held which featured the London landmark, whether it was runners in the first London Marathon, reportage of the Number 54 bus that got stuck halfway across when the bridge was opening, or some other unknown-to-them radio feature.
They were unable, though, to access any sort of comprehensive list of BBC clips that might be available on the topic, and without that list they could only request something vague like "Everything featuring Tower Bridge from the BBC". A request that at present cannot be granted from the BBC archive.
However, with a digitised ultra-local archive based around clips that came complete with location metadata, the bridge would have been able to evaluate and purchase clips for use in the exhibition.
These kinds of ultra-local clips could be made available in the same way that some software licences differentiate between different types of users with their pricing. For example, with a commercial organisation like the Tower Bridge Exhibition, the BBC could charge for the re-use, whilst making the same clips available free for educational and non-profit uses.
Additionally, discounted rates for local newspapers for example, might be offered to sections of the business community that might be worried about the commercial impact of making available ultra-local content.
Artist Career Archive
For students of art or literature, the ability to pull out of the archive all the appearances on the BBC of leading British artists like Francis Bacon or Damien Hirst. The BBC already makes a selection of this kind of content available via the BBC Four Interviews archive
However this can only represent a fraction of the interviews, features and reviews on programmes such as Arena, Today, Newsnight Review, and older arts strands.
This would form a fantastic primary research resource for the arts and academic community within the UK.
Open Subtitling Project
Related to the earlier idea of ensuring that a significant proportion of the archive becomes accessible to the deaf through signing, the BBC could institute an Open Subtitling Project.
This would allow interested parties to select material from the BBC archive, and then subtitle the programme in the language of their choice. This would be a way for the BBC to increase the international reach of their programming without incurring undue costs.
Media and Social Studies Resources
In a recent BBC Editors blog entry11 about the future of innovation in News, Adrian Van-Klaveren made the point that:
"And finally (as one now-departed news format used to say), there is the question of formats. In many ways this is the most difficult. We know that audiences value innovation but reject gimmickry. It is not enough to do something in a different way simply because we can. But when we launch something new and get it right, the impact is huge. Radio Five Live has been a long-term success through consciously achieving a sound different from Radio 4. Television presentation has been transformed – if you get a chance, just look at a programme from 15 or 20 years ago and see how formal it feels. Our uses of studios, live location reporting and interactivity have all gone through a revolution."
Of course, fifteen or twenty years ago, back in 1987, the BBC was no doubt equally being accused of 'dumbing down' news presentation, because following the example of Angela Rippon, the glamour girls of news-reading, Moira Stewart and Anne Ford, were fronting news bulletins, rather than the traditional stern-faced white men in jackets and ties.
And, let us not forget, that when BBC television news was first introduced, the newsreader was off camera lest they should distract the viewer from the information being imparted.
The BBC's archive unwittingly carries a huge resource for researchers into Britain's cultural, social and media history since the 1920s when it first carried broadcasts.
The gradual shift from dinner suited presenters enunciating every word of the Queen's English, to the modern style of having presentations teams and styles that more closely match the behaviour of the breadth of the BBC's audience.
The change in tone of language on television, from an era when the Sex Pistols swearing on ITV allegedly caused someone to be so angry that they kicked in their television set, to an era when BBC Three can broadcast "F*** Off I'm Ginger".
The gradual appearance of ethnic minorities on the screen, first often as the butt of what would now be considered racist jokes and stereotyping, and then forcing their way onto a screen that has become more representative of the ethnic make-up of the nation as a whole.
The change in political interview techniques from the deferential fifties to the confrontational Paxmans and Humprhys of today
All of these shifts in behaviour by the BBC, reflecting change in society as a whole, and are represented through the content in the archive.
The BBC's archive material represents a goldmine for academic study of trends in the social, political, and cultural history of Britain in the 20th and 21st century.
The BBC's Written Archives
In his book "This is London, Good Evening - The story of the Greek Section of the BBC 1939-1957", George Angeloglou paints a wonderful picture of a BBC buzzing with secretaries frantically typing up programme scripts, memos and letters.
In addition to the BBC's archive of programming, the Corporation possesses a vast written archive, containing mountains of the typed output of generations of BBC secretaries.
This content covers roughly five areas - programme production documentation, documentation about the running of the BBC, correspondence with the public, third party publications, and more recently, internal and external electronic communications.
Programme production material may only be of interest to the public in specialist cases, for example cult programmes like Doctor Who, where the BBC.co.uk site already makes available PDF files of programme production notes from the 1960s.12
Documentation about the running of the BBC, would, no doubt, in places contain sensitive information. Rather like the release of cabinet and Government papers following the thirty years rule though, it is likely that there would be great public interest and value in understanding how the BBC came to make decisions like the launch of BBC Two in colour, or the initial response of BBC management to the launch of ITV.
Recent electronic communications would be difficult to make available to the public due to the fact that all outgoing BBC emails pledge that the contents are confidential, and the BBC is bound by legislation such as the Data Protection Act.
However correspondence with the public would also be of great interest. The "Dear Television" programme, shown in 2005, neatly illustrated how the tone and content of the letters sent to the BBC during the previous 85 years illustrated changes in society. Of course, the BBC might face an issue with the material not having been intended for publication.
The BBC's collections of third party publications like books, magazines and newspapers should in theory replicate content held by the UK's copyright libraries, and as such should not be a high priority for the BBC to exploit.
Unlike the area of vast programming archives, large written archives have a good existing history of being digitally transferred and made available to the public.
The National Library of Scotland, for example, has been a leading proponent in making fragile archive material available digitally for all. As part of the Newsplan initiative, they host an electronic index of local newspapers from Scotland that have been preserved in local libraries across the UK.
They also have made specialist collections available online, like 1,800 'broadsides' - the one page forerunners of newspapers that were popular from the 1600s onwards.
Also in Scotland, The Scotsman newspaper has digitised their archive of newspapers from 1817 to 1950. There is a free full text search available over the editions, although users have to subscribe in order to access the articles.
This service is supported by a digital archiving specialist, Olive Software, who provide the service for newspapers and libraries in both the U.S.A. and the UK.
If the BBC wishes to make their written archives available, the best option would seem to be striking a deal with an existing specialist in this area, or working with an existing archiving institution like the British Library.
The BBC would need to determine what could be made available in the public domain - which should probably include back issues of Aerial, back issues of The Radio Times, and areas of programme production notes and BBC correspondence not thought to be sensitive.
Archiving the BBC's Websites
As the BBC has moved from the 20th to the 21st century it has increasingly had to focus on three media outputs - with internet content joining television and radio as the main channels of communication.
The BBC's website has increasingly played a role in the way that the BBC helps to weave the story of Britain over the years.
Landmark events, like the July 7th bombings of London in 2005 are not just the preserve of rolling news footage and BBC One specials, but also an area where the BBC takes a lead in online story-telling and the imparting of information.
A lot of the web material published by the BBC on July 7th 2005 was only captured and displayed by interested amateurs13, rather than by the Corporation itself.
In many ways this echoes the early days of television, where some of the earliest broadcasts were only captured by viewers with reel-to-reel tape recorders14 or primitive off-air recording methods15.
The BBC has made some moves to redress this, with an internal strategy of preserving the files that are being moved between servers, and with initiatives like the backstage.bbc.co.uk prototype BBC Home Archive16, which captures the changing content of the BBC's homepage.
However, elsewhere on the BBC's website, the preservation of archive material is patchy, with some examples of good practice, and some examples of very poor practice.
The Cult website is one example of good practice. The site was closed following the publication of the Graf report into the BBC's online services in 2005. The team behind the site put up a new front page announcing the closure, but otherwise the rest of the site remains untouched, with the same persistent URLs as when the site was live.
Likewise, another site closed at the same time, Legacies, retains all of the content produced when the site was live.
This is one approach, to close a site, and then leave it untouched.
There are also examples of sites on BBC.co.uk where the archiving of the content was a conscious decision, and this is conveyed to the user.
The People's War site, collecting first-hand accounts of the Second World War, has a new homepage allowing access to the content that was published, but with a clear indication that the site is now an archive and not an ongoing concern.
This is an excellent approach, although there may be some concern that the content is all housed within the BBC's DNA community site platform. Future changes to that platform may either impact upon the viability of the archive, or the existence of the archive may introduce legacy content complications to future development of the platform.
A second site that has visibly been archived is the BBC Governor's site, which was replaced by the BBC Trust site in 2007 when the Trust took over regulation of the BBC.
Preserving all of the Governor's content, research and publication was clearly the correct thing to do, and the site's design has been adjusted to indicate to the user that for current regulatory issues they need to visit the BBC Trust site.
One significant concern here, however, is that when the site was archived, the URL or address of the site was changed from www.bbcgovernors.co.uk to www.bbcgovernorsarchive.co.uk.
Whilst this assists in explicitly making it known that this is archive material, it also broke every incoming link to the old BBC Governor's site across the internet.
Whether the link was on the DCMS site or on a media hobbyist's blog, every single internet reference to a document or press release issued by the BBC Governor's during the lifetime of the site now leads the user to the BBC Trust homepage, rather than to the detailed information they were looking for.
Perhaps the poorest examples of archiving on the BBC.co.uk site are with the defunct Pure Soap site and the Money site.
Users entering the URL bbc.co.uk/money do not get re-directed to BBC News Business coverage, or the page for The Money Programme, or any other suitable page. Instead they are given an error message, as if they have done something wrong.
Another site that was closed after the Graf report assessed the BBC's impact on the online market was the Pure Soap site, and it is now completely unavailable. Despite the finding that an ongoing soaps website could adversely affect the market, it does not seem to have been a requirement that all of the content previously produced with Licence Fee money should be deleted.
In a further worsening of that situation, anyone typing in the old bbc.co.uk/puresoap and bbc.co.uk/soaps URLs gets redirected to the Drama messageboard.
Or rather, they would, if that also hadn't subsequently been closed.
As it now stands, users looking for content about BBC soaps via the previously much advertised URL ends up at a page listing a whole series of unrelated message boards.
This is a very unsatisfactory user experience - and would have been avoided if the content had been left on the server - with the site closed but still available to view.
Having looked at a mix of good and bad practice in the BBC's current archiving of websites, it is useful to outline some guiding principles that ought to be applied to the future archiving of online content.
Wherever possible the content should remain available to the public - the cost of storage of static content and bandwidth for closed sites will be negligible, and far outweighed by the benefit of having a complete record of the BBC's internet output.
Content should remain, wherever possible, available at the same permanent unique URL.
A visual indication should be given to the user on each page that they are viewing archive content that will not be maintained - possibly by use of the banner area or within the BBC.co.uk grey toolbar area.
Handling dynamic content
Content rendered via dynamic applications needs to be treated as a special case. Where possible, it would be advisable to flatten the content and render it in the future as static content, making it independent of the application that originally generated it.
Holding pages for removed content
If for legal or policy reasons content has to be removed, a holding page should be put in place at the top level folder of the area. This should indicate that a site is closed, and re-direct users to possible suitable alternative directions. These pages should contain alternative destinations both from the BBC and the wider internet, and should be maintained.
Digital storage strategy
Throughout the history of the BBC, the amount of archive material retained by the Corporation has depended, to a large extent, on two variables - the likelihood that material would be required for re-use, and the cost of the space to store material.
A renewed digital storage strategy for the 21st century can, to a certain extent, remove the second part of that equation.
Over the years the BBC has had to develop its own systems and processes for storing and retrieving programme data from original film recordings, then video recordings, and now the digital cassette format onto which much of the programme archive has been transferred.
Converting this to a fully digital storage solution should in the long term be much cheaper for the BBC, and require less specialisation within the Corporation. At the moment very few organisations in the world face the conservation issues that the BBC faces with the analogue and digital cassette archive it possesses.
However, many businesses specialise in large-scale data warehousing and digital archiving of material. A fully digitised archive of BBC material would no longer require storage space on BBC premises.
Instead, the BBC ought to be able to rent digital server storage space from third parties, have copies of the archive material in multiple places for redundancy and security, and have access internally to all archived material provided sufficient bandwidth is available for data transfer.
Facilitating retrieval from a fully digital archive
One of the greatest obstacles for getting value for money from any proposition to digitise and utilise the BBC's archive is the difficulty in facilitating easy retrieval of the material for either BBC production staff, the public or commercial enterprises.
It is acknowledged, as exposed by the online version of the BBC's INFAX catalogue, that the metadata associated with programming down the years can be patchy, and varies in quality and depth depending on the type of programme and the era during which it was archived.
The cost of re-evaluating all the material within the archive for in depth classification is prohibitive.
One potential method of overcoming this expense is to involve the public.
Sites like Flickr and YouTube have relied on the members of the site tagging, categorising and classifying the content according to their own emergent 'folksonomies'.
People create an organic classification structure around what is useful to them. It means that a photograph of a flower on Flickr for example might be tagged with the location it was photographed, the common name, and the Latin name.
This approach may be one that could be utilised to enhance the metadata around archive BBC programming, however it would not be satisfactory as an exclusive means of tagging and classifying the content.
This flower picture could equally have been tagged by someone else with words like 'purple', 'petals', 'nature', 'pretty', and hundreds of other adjectives that describe the subject of the picture or the mood it invokes.
A researcher wanting clips of Margaret Thatcher in her days as Prime Minister will want to be sure that when she searches for "Margaret Thatcher" she gets all the relevant clips, not a selection of results from the archive that excludes all the programming elements that have been tagged by alternative monikers like "Maggie", "Thatcher", "Lady Thatcher", "PM" and so forth.
The BBC should seek to establish a baseline of an acceptable level of classification that facilitates retrieval with a reasonable cost. The technology to do this is improving all the time and is at a stage where it could sufficiently meet the BBC's requirements.
Here is just one proposal of how an automated classification and editing system could do much of the work required to break down the BBC programming archive into a greater level of granularity.
When a programme is digitally archived it should not just be digitally encoded for viewing over IP distribution. The material should also have two other processes applied to it - automated speech-to-text conversion, and shot boundary detection.
Speech to text conversion
Accurate Speech-to-text conversion, and the voice operated computers of Star Trek, have been a futurologists dream for many years, but are increasingly becoming a viable technology.
Speech-to-text conversion has already become commodity software for PCs and Macs, with programmes available for as little as $39 which will do a passable job of converting a spoken memo into a marked-up text file.
Reasonably inexpensive Speech-to-text conversion can therefore provide a workable transcript of a programme.
Of course, it will not be 100% accurate.
The current technological solutions will not deliver as well where there are strong regional accents, loud background music, unusual words, hurried and confused speech.
However, for news, documentaries and current affairs programmes the system should produce very usable transcripts - provided Question Time doesn't get too heated of course!
And, the further one goes back into the BBC's archive one goes, the more likely one is to find presenters using standard well-pronounced 'BBC English', making it easier for speech-to-text converters to work.
The BBC should also be able to take advantage of the fact that it has a close relationship with Red Bee Media, who are able to provide infrastructure and expertise in subtitling.
Effectively, producing transcripts of programmes as they are digitised for the archive will be similar to making a set of subtitles for every show. The difference is to also export the subtitles in a format that can then be used away from the realm of programme playback.
Shot boundary detection
Shot boundary detection automatically detects significant changes between the frames of a television programme.
Again, the technique is not infallible. News programmes where Huw in studio cuts to Nick in Westminster cuts to Huw in studio cuts to Nick in Westminster cuts to clip of Parliament cuts to Nick in Westminster cuts to Huw in studio would be interpreted by a human as one segment featuring a two-way conversation. Shot boundary analysis would most likely break the clip into much smaller portions than strictly necessary.
Likewise, arty dissolved shots and fade effects in drama programmes would not always be picked up.
Shot boundary detection is not yet as much of a commodity software as speech-to-text conversion. Systems such a Cumulus 7 with Video Suite 2 cost around $5,000 for a 10 user licence, but there are some much cheaper plugins available for video editing software on Windows PCs to do the same job.
If both speech-to-text conversion and shot boundary detections techniques are applied at the time of digital transfer, then in addition to a complete digital copy of a programme, the BBC would also be in possession of a reasonable transcription of a programme, and a reasonable guide to how the programme breaks down into different shots. With both the transcript and the shot boundaries time-coded, it would be possible to match the right bit of a programme precisely to the right bit of the transcript.
This then gives the BBC a very powerful tool for the retrieval of partial clips from the archive.
The key is in accepting that getting 100% accurate transcripts and 100% accurate shot boundaries would be unattainably expensive. However a 80%-90% accurate transcript with 80%-90% accurate shot boundary detection gives the BBC far more to work with than it has at present - which is either a whole digitally encoded programme or nothing.
Successful retrieval of archive material by the BBC, by the public and by commercial organisations relies on picking the best mechanism for the job, knowing that the material being worked from is not 100% accurate.
A search retrieval technique called clustering offers the BBC a viable possibility in this arena.
The principle here is that instead of simply looking for the precise instances of a string of text, as an internet search engine like Google or Yahoo! currently does, the documents being indexed are aggregated into clusters of similar concepts.
The clusters are extracted from the entire document set automatically, and Quintura and Clusty are two existing search engines on the internet utilising this technique.
The strength this has when dealing with slightly 'fuzzy' transcripts is that, rather like Tolstoy's observation on families, when a transcript is accurate ('Wolverhampton railway') it will be accurate every time and form a valid cluster. Where a transcript has erred ('Wolves hamster railway, Wolverhampton wail way, Wolf air Hampton railway') it will be in error in many different possible ways, thus not forming a cluster.
Getting search results that build clusters of related text from the transcripts of programmes will allow the BBC to automatically build clusters of programme clips.
The clustered search results for "Leeds" for example, will build groups based around the city council, the football team, the Rugby team and so on.
These transcripts can be linked back to the originating programmes, and because the transcripts and video have time-coding embedded in them, the user can retrieve the relevant clip.
Expanding clustered results beyond television programmes
So far in evaluating this potential approach to digitising the BBC's television programme archive we have not considered audio or the written archive.
One advantage of using transcripts as the basis for aggregating similar elements within programming is that radio and television can now operate on a level playing field. If, at the same time that radio programmes are digitally encoded they are also made to generate a transcript, that transcript can be included in the index of the archive.
Thus a search for "Martin Amis" should then bring up all mentions of him from both television programmes AND radio programmes.
Throw the indexing of the BBC's written archive into the mix - the software mentioned earlier in this paper for digitising archives produces XML files of the contents of the documents - and pretty soon you would have a comprehensive resource that would automatically be able to group related concepts from television programmes, radio programmes and programme documentation.
The BBC's archive is a unique British national treasure, and with the advent of digital technology, for the first time it seems possible that the BBC can realise the aim of allowing free public access to the wealth stored in the Corporation's archives.
Decisions taken over the coming years will have an impact for generations on how freely and easily accessible this material is to the British public.
The BBC will be able to get increased value from the contents of the archive if it views the material as much more than simply a collection of previously transmitted complete programmes. The real value is in exposing the content for consumption and re-use at a more granular level.
The guiding principles for the BBC as it makes the archive available to the public who have paid for its creation should be to make material available
in as many open accessible formats as possible
to include written and digital archive assets in plans for the archive
to break programmes down into their component clips and segments
to facilitate collaboration with businesses, community groups, local authorities and individual volunteers
BBC Archives: Beyond The Programme Download Martin Belam, July 2007 Page of