4.2Why preserve software?
A key question to answer with respect to preservation of software is why it is a desirable thing to do in the first place. After all, software has a track record of being both being very fragile and very disposable.
Software is fragile as it is very sensitive to changes in environment; as hardware, operating system, versions of systems (e.g. programming languages and compilers) and configuration change. When the environment changes, software notoriously stops working, crashes corrupting vital pieces of data, or works but not quite as originally intended, with missing or non-quite the same functionality. The last case can be particularly damaging, as the software may seem to operate but actually produces subtly different results. For example, compiling with a different floating point module may produce quite different results in the analysis.
Software is disposable as often in the face of environment change, and also in the face of the complexity of large-scale systems, developers often throw away the previous software and start again from scratch ("not invented here syndrome"). After all, if you know the problem to be solved, and you have preserved the original data, it may be easier to write new software rather than handle legacy code, and you may be able to produce a faster, more user-friendly system which operates in a modern environment, and with the developer who understand the code to hand, rather than long gone from the organisation.
However, there are also good reasons to preserve software - especially in a research and teaching environment. Some of these reasons would include the following.
4.2.1 Museums and Archives
A small but significant constituency of software preservation is museums and archives which specialise on preserving aspects of the history of computing and its influence on the wider course of events. These institutions thus want to preserve important software artefacts as they were developed at the time of their creation or use, so that future generations of historians of science (and the general public) can study and appreciate the computers available that particular period, and trace its development over time. It is also recognised that archives of software may be useful to resolve copyright or patent disputes.
Such museums themselves often concentrate on preserving hardware. For example, Bletchley Park7 and the National Museum of Computing8 preserve or rebuild historic machines, including early code-breaking machines from WWII, as does the Science Museum9, the Museum of Science and Industry in Manchester10, and the Computer History Museum11 in Silicon Valley in California, USA. These machines are often kept in working operation, so there is a need to preserve the software.
Others archives are interested in preserving the software alone, typically via a web presence. Examples include the Chilton Computing website12, which includes the Atlas Basic Language Manual describing the software architecture of the Atlas computer from 1965, the Multics History Project13, which preserves the code for the Multics operating system, or Bitsavers14, which preserves documentation and software for minicomputers and mainframes from the 50's to the 80's.
To give more detail on the Multics History Project, this has a concerted effort to locate and involved the original experts on designing and using the system before they die to capture their knowledge. The project seeks to preserve the binary, to “preserve the bits” and document the formats, But it also has an emphasis on capturing the implicit knowledge of the development organization and process, and to create a “map” of the software describing different ways of approaching it, via for example, capturing its source code, the coding interfaces, and its functions. It also wants to capture the development history and as much of the documentation as is available. This is for historical purposes; it seems unlikely that the Multics system will be revived in itself, and most of the functionality could be emulated elsewhere and the data generated using it processed on different systems. Nevertheless, Multics is an object lesson in software engineering (good and bad), and is undoubtedly valuable for future generations of computing engineers.
Other groups wish to preserve hardware and software as research interests or private enthusiasms, for example the Computer Conservation Society15, a specialist interest group of the BCS, the Software Preservation Group16 supported by the Computer History Museum, or a number of groups such as the Software Preservation Society17 interested in preserving Games software for obsolete platforms, such as the Sinclair Spectrum, Acorn BBC Micro or Amiga.
In this context, there has been given some consideration of how to preserve software. See for example Preserving Software: Why and How by Zabolitsky [3], but this is largely limited to preserving historic software as a unit with the historic hardware, so the major concern is preserving representation of the code on some physical media, with appropriate backup and replication strategies. The problem of preserving the usage of the software in a future context is not considered in detail. The proceedings of the Computer History Museum’s workshop “The Attic & the Parlor: A Workshop on Software Collection, Preservation & Access”, May 5, 200618 gives an overview of approaches to software preservation being undertaken in museums and archives. Major concerns are how to collect important software packages, especially with a variety of licensing constraints, and how to interpret and display them to the public.
The enthusiasts who would like to preserve games software recognise the problem of maintaining the usability of the software, which is the point of preserving old games. They also recognise the problems associated with copyright and copy protection. They adopt preservation strategies which use software emulation of obsolete platforms, and conversion of the binary to universal
4.2.2Preserve a complete record of work
Software is frequently an output of research. This is particularly the case in Computer Science where the software itself is an important test-bed of the hypothesis of the research - if you can't implement and demonstrate the advantage of the assertion, in computer science terms the assertion is not proven. However, this software as an output of research extends beyond Computer Science as many research projects across all disciplines now frequently have an aspect of computing and programming to demonstrate the hypothesis of research.
If university archives and libraries are going to maintain a complete record of research, then the software itself should be preserved. Frequently, in practice, theses do come with appendices of code listings or with CD-ROM's inserted into the back cover with the supporting software. However, while the theses are stored on library shelves, software content is not necessarily preserved against media change (can we read those disks in a few years time?) or change in the computing environment making the code difficult to run. Research projects again frequently produce software, or specialist modifications to existing packages to support their claims, or to carry our special analysis of data, so the results of the project are hard to interpret and evaluate without the software. However, at the end of the project, unless the software is taken up as a community effort, or in a subsequent project, there is little incentive or resource to maintain access to the software in a usable form.
Library preservation strategies should accommodate the preservation of software as well as other research outputs.
4.2.3 Preserving the data
Related to the previous point, is the reproduction and verification of the results of research which has generated and analysed data, and published the results. In order to verify the asserted results of a research project, then it should be reproducible. In many circumstances it may enough to rerun the analysis on current software if the original data has been preserved. But in other circumstances, testing accuracy or detecting fraud for example, it may be necessary to rerun the original software precisely to reproduce the exact result. Scientific reputations may be at stake here - and they should be judged on the results available to them at the time, using the software as it was available to them, rather than newer software. Newer software may have errors corrected, have higher performance or accuracy characteristics, or else have improved analysis algorithms or visualisation tools. All these factors may lead later analysis of the data to different conclusions to those originally deduced, but the scientists should nevertheless be judged on the view they were able to take at the time.
A further issue here is the reuse of data. Data which is collected on sophisticated experimental equipment or facilities19 is expensive; other data which is recording specific events, such as environmental conditions at particular times and places, is non-reproducible. In these circumstances, it is desirable to allow the data to be preserved and reused in order to maximise scientific potential of the data. In these circumstances, it is necessary to preserve some supporting software, to process the data format, and to provide the appropriate data analysis.
This reason, as has already been noted is also relevant to the preservation of other digital objects. Preservation of document or image formats requires the preservation of format processing and rendering software in order to content accessible to future users.
Thus it is necessary to preserve software to support the preservation of data and documents, to keep them live and reusable. In this case, the prime purpose of the preservation is not to preserve the software per se, so it may be suitable in to not ensure that that software is reproduced in its exact form, but only sufficient to process the target data. Thus we introduce the key notion of adequacy, to provide “good enough” preservation” with key properties of the software preserved and others disregarded.
4.2.4 Handling Legacy
Perhaps the prime motivation to preserve software for most people is to save effort in recoding Code from the past still needs to be used, due to its specialised function or configuration and it is frequently seen as more efficient to reuse old code, or keep old code running in the face of software environment change than to recode. This is certainly the reason for most existing software repositories, and a significant part of the effort which is undertaken by software developers both in-house to end-user organisation, and also within software houses. Handling legacy software is usually seen as a problem, and many strategies are undertaken in order to rationalise the process, to make it more systematic and more efficient. As a consequence, an important source of information on significant properties for preservation is the best practice on software maintenance and reuse, a long recognised part of good software engineering. If you can find an existing package or library routine, why bother rewriting it? Of course in these circumstances you need assurance that the software will run in your environment and provide the correct functionality
Share with your friends: |