3.4Methodology
The study undertook a number of activities to complete its task of providing a outline framework for the significant properties of software.
-
Set bounds of project. A scoping exercise to set the boundaries of the study, to set application domain of interest considered and also consider areas out of scope. This identified mathematical and scientific software and being of particular interest, and scoped the project as in section 3.3 above.
-
Surveying literature. An ongoing task was to survey the literature available to consider the current state of software preservation and work on identifying the significant properties of software in particular. It soon came apparent that there is relatively little work on software preservation and virtually none on identifying significant properties in this context. However, it soon was apparent that a number of related areas were of interest to software preservation, particularly aspects of software engineering, especially version control, testing and reuse. We consider various relevant aspects of software engineering in section 5 below.
-
Consider case studies in software preservation. In order to establish the current best practise in software reuse, we considered in detail a number of specific examples of software developments, where packages have been developed, distributed and maintained over a long period, and repositories which are either specifically designed to hold software. A number of visits were undertaken to discuss with software package and repository managers their approach to software maintenance and the ongoing problems of long-term preservation of software, and how to accommodate change in the technological environment. Discussions and visits were undertaken with Starlink, BADC, CCPForge, and NAG.
In preparation for these visits, we prepared a number of questions which, although not a formal questionnaire which was given to managers, nevertheless provided guidelines for discussion, and could be used in future as a basis for a more formal analysis of significant properties. We reproduce these questions in Appendix B.
-
Develop framework and test on examples. From the literature and the case studies, and also discussions with other projects on preservation and significant properties, notably, InSpect, the study on Vector Graphics, the CASPAR and SCARP projects, an analysis was undertaken and a conceptual framework for software developed, so that the complex organisation of software artefact could be captured and organised, and significant properties assigned appropriately. A number of smaller illustrative examples were considered.
4Digital Preservation of Software
During the course of the study, it became clear that software preservation was a term that was not necessarily considered a great deal, and when it was, it means different things to different people. Although we acknowledge that a single definition will not satisfy everyone, in this section we set out some baseline concepts of software preservation, and consider some of the motivations behind software preservation. This is a necessary prerequisite before considering the significant properties of software; the properties are clearly only significant in the context of the preservation task in hand.
4.1What is software preservation?
Software preservation has four major aspects.
-
Storage. A copy of a software “package” needs to be stored for long term preservation. As we discuss below, software is a complex digital object, with potentially a large number of components constituting a package (c.f. an information package as in OAIS); what is actually preserved is dependent on the software preservation approach taken (see Section 8.2). Whatever the exact items stored, their should be a strategy to ensure that the storage is secure and maintains its authenticity (fixity again using OAIS terminology) over time , with appropriate strategies for storage replication, media refresh, format migration etc as necessary.
-
Retrieval. In order for a preserved software package to be retrieved at a date in the future, it needs to be clearly labelled and identified (reference information in OAIS terminology), with a suitable catalogue. This should provide search on its function (e.g. terms from controlled vocabulary or functional description) and origin (provenance information).
-
Reconstruction. The preserved package can be reinstalled or rebuilt within a sufficiently close environment to the original that it will execute satisfactorily. For software, this is a particularly complex operation, as there are a large number of contextual dependencies to the software execution environment which are required to be satisfied before the software will execute at all.
-
Replay. In order to be useful at a later date, software needs be replayed, or executed and perform in a manner which is sufficient close in its behaviour to the original. As with reconstruction, there may be environmental factors which may influence whether the software delivers a satisfactory level of performance.
In the first two aspects, software (once a decision has been taken on what software components to preserve) is much like any other digital object type. Storage media which are secure and maintain integrity, and methods to identify and retrieve suitable objects are required in all cases. However, the problem of reconstruction and replay is especially acute for software. Digital objects designed for human consumption have requirements for rendering which again have issues of satisfactory performance; science data objects also typically require information on formats and analysis tools to be “replayed” appropriately. However, software requires an additional notion of a software environment with dependencies to other hardware, software and build and configuration information.
Note that other digital objects require software to provide the appropriate level of satisfactory replay, and thus for other digital objects there is a need to preserve software (and thus record its significant properties) too; as we shall see, there is also a dependency on the preservation of other object types (e.g. documentation) for the adequate preservation of software.
Share with your friends: |