Appendix A: An OWL Ontology for Significant Properties of Software 83
Appendix B: Questions to Analyse Software Preservation Practices 92
Outline Questions 93
More detailed questions 93
Appendix C: A possible categorisation of software licensing 97
Digital preservation has become pressing concern as more and more of the records of human activity are generated and processed electronically. Software is a class of electronic object which is not only the result of research, but is frequently a vital pre-requisite to the preservation of other electronic objects. However, consideration to the preservation of software as a digital object in its own right has to date has been very limited. Software is seen as complex – forbiddingly so for people who want to maintain access to software but were not involved in its development – and its preservation is often seen as a secondary activity and one with limited ultimate purpose.
Software preservation is thus a relatively new topic of research and there is little practical experience in the field of software preservation per se. This study has become an exploration of the area of software preservation as much as defining the significant properties for preservation. Consequently, we discuss some of the motivations and approaches taken to preserve software.
Software is a very large topic with great range and diversity. In this study we restricted ourselves to mathematical and scientific software used in the academic research community in parts of the UK. We considered the literature and discussed case studies with either practitioners in providing repositories of software, or developers of software packages which have had a very long lifetime.
Although there are many groups who are holding software to support archives, or to support a community, and many others who are maintaining a usable software package for a long time, these groups do not consider themselves to be doing software preservation and have other priorities. Those that do carry out software preservation are often amateur or specialised in science museums or special interest groups; these are ad hoc and small scale and do not tackle the systematic problems of keeping software replayable in a broad context for the long term. Other projects are looking more systematically at digital preservation and tend to rely on the persistence of software, or at least access to software with similar functionality to the original. However, they tend not to concentrate on the problem of how to preserve the software itself.
However, there are good reasons of preserving research effort in software, often as a vital adjunct for preserving other digital objects. Preserving software essentially means that software can be reconstructed and replayed to behave sufficiently closely to the original.
Software is inherently complex with a large number of components related in a dependency graph, and with specification, source and binary components, and a highly sensitive dependency on the operating environment. Handling this complexity is a major barrier to the preservation of software. Different preservation approaches can be adopted which can execute binaries directly, can emulate the software, or carry out software migration by recompiling source code, or even recoding. All can in different circumstances support good preservation.
Adopting the notion of performance from the NAA and InSPECT, we developed a notion of performance of software which is closely related to the adequacy of the performance on the target data. Establishment and preservation of test cases for expected behaviour of end software on test data is a key feature for assessing the adequacy of performance of software preservation on specific chosen significant properties.
Good software engineering practice to support software version control, software maintenance, migration and especially software testing can also support software preservation. Groups which have successfully maintained software over a long period have developed rigorous software engineering practice and developed techniques to support software migration in particular.
To capture and control the inherent complexity of software, we have developed a conceptual model for software which is more complex than that of InSPECT, Many of the structuring significant properties of software are thus captured in this model. Significant properties of software are then categorised according to this model and also according to their role. As a consequence, it is observed that the InSPECT categorisation of significant properties does not match comfortably with the significant properties of software. This is probably because of the indirect performance model of software, which is tested by the performance of the end data. Contextual significant properties play a key role and software is dependent upon them being satisfied for satisfactory reconstruction and replay, whilst behavioural significant properties determine the performance of the software on end data.
Given the relatively immature state of the art in software preservation, we consider our definition of a conceptual model of software and the associated identification and classification of significant properties to be a proposal, which needs to be evaluated further in practice to judge its value and effectiveness in practice.
The significant properties identified in this study are still relatively general and do not go into the detail of other significant studies. For example, we decided that we would stop at the level of granularity of code represented by the common coding concept represented by a public class or module or subroutine (terminology varies between programming language) and it would not be worthwhile detailing any further. Other significant properties also stop at a high level, and do not for example enumerate the possible values which they could take1. Further testing and evaluation is required to see if this is sufficient and whether the significant properties are always appropriate and whether they can be extracted and used in practice.
Tools support should be eventually forthcoming to support the significant properties of software; however, we feel that the above development of the methodology needs to be investigated further before investing comprehensive tool support.