The Significant Properties of Software: a study

Download 0.66 Mb.

Page	14/21
Date	18.10.2016
Size	0.66 Mb.
	#2594

1 ... 10 11 12 13 14 15 16 17 ... 21

8Conceptual Framework

In order to express the significant properties of software, we need to develop a conceptual framework to capture the approach taken to software preservation and the structuring of the software artefact and the significant properties of software for preservation.

8.1Software Preservation Approach

Various approaches to digital preservation have been proposed and implemented, usually as applied to data and documents, but they do usually apply to the means of preserving the underlying software used to process or render the data or document. Thus these preservation approaches directly relate to the preservation of software.

The Cedars Guide to Digital Preservation Strategies [3] defines three main strategies, which we give here, and consider how they are applicable to software.

Technical Preservation. (techno-centric). Maintaining the original software (binary), and sometimes hardware, of the original operating environment. Thus this is similar to the use case for software preservation arising from the museums and archives where the original computing hardware is also preserved and as much of the original situation is maintained as is possible. This is also an approach which is taken in many legacy situations; otherwise obsolete hardware is maintained to keep vital software in operation.
Emulation (data-centric). Re-creating the original operating environment by programming future platforms and operating systems to emulate the original operating environment, so that software can be preserved in binary and run "as is". This is a common approach, undertaken in for example the PLANETS project, and also by groups such as the Software Preservation Society.
Migration (process-centric). Transferring digital information to new platforms before the earlier one becomes obsolete. As applied to software, this means recompiling and reconfiguring the software source code to generate new binaries, apply to a new software environment, with updated operating system languages, libraries etc.

In practice, software migration is a continuum. The minimal change scenario is that the source code is recompiled and rebuilt unchanged from the original source. However in practice, the configuration scripts, or the code itself may require updating to accommodate differences in build systems, system libraries, or programming language (compiler) version. An extreme version of migration may involve rewriting the original code from the specification, possibly in a different language. However, there is not necessarily an exact correlation between the extent of the change and the

Software migration (or “porting” or “adaptive maintenance”) is in practice how software which is supported over a long period of time is preserved. Long term projects such as StarLink, or software houses such as NAG spend much of their effort maintaining (or improving) the functionality of their system in the face of environment change.

These three approaches have their advantages and disadvantages, which have been debated in the preservation literature.

Technical (hardware) preservation has the minimal level of intervention and minimal deviation from the original properties of the software. However, in the long-term this approach is difficult to sustain as the expertise and spare components for the hardware become harder to obtain.

The emulation approach for preserving application software is widespread, and is particularly suited to those situations where the properties of the original software are required to be preserved as exactly as possible. For example, in document rendering where the exact pagination and fonts are required to reproduce the original appearance of the document; or in games software where the graphics, user controls and performance (it should not perform too quickly for a human player on more up to date hardware) are required to be replicated. Emulation is also an important approach when the source code is not available, either having been lost or not available through licensing or commercial restriction. However, a problem of emulation is that it transfers the problem to the (hopefully lesser) one of preserving the emulator. As the platform the emulator is designed for becomes obsolete, the emulator has to be rebuilt or emulated on another emulator. Thus a potentially growing stack of emulation software is required.

The migration approach does not seek to preserve all the properties of the original, or at least not exactly, but as observed in the CASPAR project, only those up to the API – which we could perhaps generalise to those properties which have been identified as being of significant for the preservation task in hand. Migration then can take the original source and adapt to the best performance and capabilities of the modern environment, while still preserving the significant functionality required. This is thus perhaps the most suited where the exact (in some respects) characteristics of the original are not required – there may be for example difference in user interaction or processing performance, or even software architecture – but core functionality is maintained. For example, for most scientific software the accurate processing of the original data is of key importance, but there is a tolerance to change of other characteristics.

These three different preservation strategies thus require different levels of detail of significant property for the software artefacts. In this report, we are neutral to the preservation approach, but consider how the preservation of the key properties can be identified and checked.

8.2Performance Model and Adequacy

Closely related to the preservation approach is the notion of how sufficient level of performance to adequately preserve required characteristics of software. Performance as a model for the preservation of digital objects was defined by the National Archives of Australia in [6] to measure the effectiveness of a digital preservation strategy. Noting that for digital content, technology (e.g. media, hardware, software) has to be applied to data to render it intelligible to a user, they define a model as in Figure 8. Here Source data has a Process applied to it, in the case of digital data some application of hardware and software, to generate a Performance, where meaning is extracted by a user. Different processes applied to a source may produce different performances. However, it is the properties of the performance which need to be taken into account when the value of a preservation action. Thus the properties can arise from a combination of the properties of the source with the technology applied in the processing.

Figure 8: NAA Performance Model

The notion of performance has been developed in the context of traditional archival records, and has been adopted in other studies into the significant properties of different media types (see [5], [2]), which considers comparing the performance created by the original process of rendering with that created by a later rendering processes on new hardware and software. The question which arises is how this model applies to software.
In the case of software, the performance is the execution of software binary files on some hardware platform configured in some architecture to provide the end experience for the user. However, the process stage depends on the nature of the source used. This is illustrated in Figure 9.

In the case where binary is preserved, the process to generate the performance is one of preserving the original operating software environment and possibly the hardware too, or else emulating that software environment on a new platform. In this case, the emphasis is usually on performing as closely as possible to the original system.

When source code and configuration and build scripts are preserved, then a rebuild process can be undertaken, using later compilers and linkers on new a new platform, with new versions of libraries and operating systems. In this case, we would expect that the performance would not necessarily preserve all the properties of the original (e.g. systems performance, or exact look and feel of the user interface), but have some deviations from the original.

In an extreme case, only the specification of the software may be preserved. In this case, a performance could be replicated by recoding the original specification. In this case, we would expect significant deviation from the original and perhaps only core functionality to be preserved. This case would seem to be exceptional, however, it is less unusual in coding practice, as packages are often migrated into a different language; for example the NAG library originated in FORTRAN, but later produced a C version. In some circumstances, this is a result of reverse engineering where source code (or even in extreme cases binary code) is analysed to determine its function and recoded.

Figure 9: Software Performance models from different sources

Software performance can thus result in some properties being preserved, and others deviating from the original or even being disregarded altogether. Thus in order to determine the value of a particular performance, in addition to the established notion of Authenticity of preservation (i.e. that the digital object can be identified and assured to be the object as originally archived) we define an additional notion of Adequacy.
A software package (or indeed any digital object) can be said to perform adequately relative to a particular set of significant properties, if in a particular performance (that is after it has been subjected to a particular process) it preserves that set of significant properties to an acceptable tolerance.
By measuring the adequacy of the performance, we can thus determine how well the software has been preserved and replayed.

8.2.1Performance of software and data

A further refinement of the performance model for software is that the measure of adequacy of the software is closely tied to the performance of the input data. The purpose of software is (usually) to process data, so the performance of a software package is processing of its input data. This relationship is illustrated in Figure 10.

Figure 10: Performance models of software its input data

So for example, in the case of for example, a word processing package which is preserved in a binary format, which is processed via operating system emulation, the performance of the package is the processing and rendering of word processing file format data into a performance which a (human) user can experience via reading it off a display. Thus for many functional properties of software, the measure of adequacy of the software is the measure of the adequacy of the performance when it is used to process input data and how well it preserves the significant properties of its input data.
This can be applied recursively to software which processes other software, for example software used for emulation or compilers to build software binaries, which also needs to be preserved. In this case, the performance of software is the processing of the application binaries or source code, which in turn is measured by its adequacy in processing its intended input data.
Thus different preservation approaches require different significant properties, and the adequacy of the preservation is dependent upon the performance of the end result on the end use on data. The adequacy can be established by performing trial executions against test data. Thus, the adequacy of preservation of a particular significant property can be established by testing against pre-arranged suites of test cases with the expected behaviour.

Directory: twiki -> pub -> Main -> SigSoftTalks
Main -> Konkordanz Piccard-Online / iph 0 (1997)
twiki -> Lcross observatory Status Sheet
twiki -> Advanced algorithms

Download 0.66 Mb.

Share with your friends:

1 ... 10 11 12 13 14 15 16 17 ... 21