Current software repositories exist to collect and distribute software. For example, there are several web-based archives of open-source, shareware or software available to the community under common licensing agreements, such as Eduserv Chest37. In domain specific areas, repositories have been established which collect and distribute specialised software, including the US National HPCC Software Exchange for high performance computing38, Starlink39 for astronomy, GAMS for mathematical software40, OMII for Grid software41, CCPForge for Collaborative Community Projects in computational chemistry (discussed below). The primary purpose of these repositories is to provide a central point for a community interested in a particular topic to share and exchange software and also include tools for community software development; the most well known and most general of these is the SourceForge site for open source software development, discussed below.
Software characterisation in these repositories varies from using simple alphabetic lists, through simple categorisation for broad functional areas, through to indexing via detailed hierarchical thesauri of terms such as the GAMS thesaurus, used by GAMS, NHSE and NAG42. Thus significant properties characterised are typically restricted to discovery terms, together with programming language and supported operating systems. The quality control on the software in the repository varies. However, these properties are not considered in the context of preservation.
Classification schemes for software have been proposed and are used to classify the function and to some extent the context of software. These schemes provide a functional classification of software within a controlled vocabulary. They are generally designed to be broadly based and do not give a lot of detail into the precise functioning of the software package itself. Thus they are largely useful as a significant property for categorisation, search and discovery rather than enabling the replay of the software.
Some well-known examples would include the following.
The Computing Classification Scheme of the Association of Computing Machinery (ACM)43 is used usually to classify computer science articles within ACM publications. It provides a broad view of computing, mixing different facets of computing, such as application domain (e.g. business applications), computing science area (e.g. Artificial Intelligence), and hardware platform. While a good starting point for the general aspects of the significant properties of software, much more work would be required to make this complete.
More domain specific classifications such as the Guide to Available Mathematical Software (GAMS) Thesaurus44, give focussed and detailed information into the functionality supported. For the functional significant properties of software packages (down to a level of detail of a library item) in this area, this may be sufficient.
Other thesauri also exist such as INPEC or the USPTO (United States Patent and Trade Office) classification scheme. Again these are too general to be of great use to specify the significant properties of software in anything other the most general functional terms.
6.5SourceForge and CCPForge
Over the last ten years, the notion of a software forge has been developed. A software forge is a web-site which hosts software development projects which involve developers in highly distributed locations. These are typically open-source efforts (though there is no reason why they always should be), with a large number of volunteer developers operating across a number of countries and organisations. Such community efforts require a common place to lodge and share their codebase, provide documentation and community communications; the software forge site provides this functionality.
The most well-known of these sites is SourceForge45, which supports 100,000 projects with 1,000,000 users. SourceForge provides a space for source code management using CVS, documentation, simple web site and community forums for discussion and bug tracking. Projects in SourceForge are categorised against a fairly crude topic mapping. Other sites provide similarly functionality, typically for a specialised community, where there is the opportunity for more specialised support and interaction with a like minded community.
CCPForge46 is a software forge which provides a self service space for software from the Computational Chemistry community. Again, CCPForge provides a CVS repository for source code management and a number of tools to support community engagement. Unlike SourceForge this repository is actively managed and users normally need to create an individual username by self-registration and then have this authorised for individual project access. CCPForge is modelled on SourceForge principles but also provides some facilities for building and maintaining binary versions of the code. Projects can provide Bug Tracking, Support and Feature Request facilities – in addition to the more common Documentation Management, Mailing Lists and, of course, Platform Specific pre-built binary packages. CCPForge sees its task as very much to provide a meeting place for developers and a storage and source code management facility for the projects in the Computational Chemistry. It does not regard that it has any role of controlling the projects or putting specific requirements on them - for example for software preservation. It is up to the projects themselves to take what ever actions they see fit to maintain their documentation and make them suitable for migration. That is part of the software engineering practice of the project itself.
Nevertheless, providing better facilities to support preservation in software forges could be a suitable future developement of such systems, to build on the existing good practice in source code control and documentation to record the significant properties of software for preservation and thus convert the forges into archiving repositories.
Share with your friends: |