Scope and methodology issues


Web technologies and metadata standards



Download 212.29 Kb.
Page2/7
Date02.02.2017
Size212.29 Kb.
#15350
1   2   3   4   5   6   7

Web technologies and metadata standards

I move on now to describe and discuss in detail some of the more recent innovations in technology and standards for the World Wide Web that are affecting, or are likely to affect, the development of library systems. Web technology has moved well beyond the use solely of HTML for markup and the URL system for identifying resources. Again, I cannot be exhaustive, but choose to focus on three inter-related technologies, Z39.50, XML (including Web services) and Java, which appear to be of major significance for the development of Web-based library systems.



a) Z39.50

Z39.50 is a communications standard which describes the rules and procedures for communicating between two computer systems for searching and retrieving information from databases (Lunau 2000). It is a “broker architecture” which offers client-based services that interact with external servers through a standard protocol (Pearce 2000). It enables a remote source to be searched using the interface of the local client, obviating the need to master a variety of search interfaces and facilitating the integration of bibliographic resources.


It was originally proposed in 1984; the current version (version 3) was adopted in 1995. It uses the client/server model. Originally it was conceived with an OSI framework, however, most implementations now run over TCP/IP (Kunze and Rodgers 1996). It is a stateful, session-oriented protocol. In Z39.50, the system initiating the association is termed the origin, while the system that is searched is termed the target. The basic record syntax used is MARC, but other syntaxes can be used, e.g. SUTRS, OPAC (for OPAC displays), GRS-1, and Summary). Significantly, Z39.50 treats XML as an additional record syntax (Jørgensen 2000); it is considered likely that XML will replace GRS-1 and SUTRS in future (Gardner 2000). A number of different configurations of a system are possible (Evans 2001a).
A Z39.50 session consists of a number of stages or facilities, which in turn incorporate a series of messages: (2001a). Broadly, these divide into core facilities and the so-called extended services. These are included in version 3, and provide the ability, for instance, to order material for interlibrary loan, to save searches for re-use, to download searches, to retrieve catalogue records, and to update databases. This potentially allows many library processes to become “open”. The explain feature allows the user to query remote databases about available services, and to configure dynamically search and retrieval. It defines a database structure, a search methodology and a retrieval mechanism that a Z39.50 server can use to provide information to clients about the databases which it offers and the content of those databases. Local administrators decide which attributes and attribute values (see below) should be available for the Z39.50 client.
For interchange of bibliographic data the standard defines the Bib-1 attribute set, which covers the six types of attribute that can be used to form a query: use, relation, position, structure, truncation, and completeness. In version 3, the query structure can perform Boolean searches using the operators AND/OR/NOT. Sometimes use of the proximity operator PROX is possible, and also restriction of the search to a particular field, e.g. author. Although the standard itself defines only the interaction between one client and one server, many vendors have implemented the ability to broadcast requests simultaneously to several Z servers (Lynch 1997).
It can readily be seen that the potential implications for library services and systems of such a standard are profound. Z39.50 tools allow the searching and downloading of bibliographic records in MARC format, which has implications for the sourcing of catalogue records. Z39.50 also permits the development of user-mediated document supply and SDI services (Evans 2001a). Z39.50 OPACs allow the extension of bibliographic access to other Z39.50-enabled systems, hence the growth of interest in the development of virtual union catalogues, which are considerably cheaper and easier to maintain than physical union catalogues. There have been a number of virtual union catalogue projects: the University of California Union Catalog (Coyle 2000), the Canadian Virtual Union Catalog (vCuc) (Lunau and Turner 1997), the Z Texas Project (Moen 1998), and the RIDING, M25links, and CAIRNS “clumps” projects in the UK (Cousins 1999). Typically, however, a variety of problems arise with Z39.50 searching (Pinfield 1998, 2001; Stubley 1999; Ridley 1999; Agnew 2001):


  1. the number of attributes supported by all targets tends to be small; this leads to difficulties constructing effective search statements

  2. institutions adopt varying practices when mapping data to the bib-1 attribute set

  3. serial holdings are catalogued differently in different vendors’ systems

  4. searches are slow

  5. the results are confusing to the end-user; the search generates varying levels of detail among bibliographic, archival and subject gateway records

  6. databases have implemented different indexes and may search an inappropriate index such as a name index for an author search request

  7. large result sets are caused by a server not allowing a precise search, and treating all searches as keyword searches

  8. there is currently no agreed method for the provision of location, holdings and circulation information in response to a query

  9. scalability is an issue: searching of more than 5-7 institutions at a time can result in network bottlenecks due to the client/server communications overhead.

Version 3 of Z39.50 is in fact very general in scope and incorporates a large number of options; arguably too many (Lynch 1997). The original intention of those drafting it was that particular user communities should define profiles specifying how Z39.50 was to be used in their applications and what type of data is to have access provided to it, e.g.:




  1. what Z39.50 functionality must be supported

  2. what minimum search attribute and attribute combinations are required

  3. what record syntaxes need to be supported

  4. how security and access issues are to be handled

  5. minimum and maximum lengths for various data elements (Needleman 2000)

The Bath Profile7 is dominant among the profiles so far devised by the library community. It is designed to solve some of the problems of Z39.50 implementation: it identifies features of the standard that are required to support effective use of Z30.50 software for a range of library functions. It defines a core set of author, author + title, and subject search and retrieval specifications across a variety of library databases, as well as more complex searches. Its functionality and specifications are intended to be incorporated into more detailed regional specifications. Problem 8) above is the subject of the ZIG Holdings Schema8 (Stubley 1999).


As Pearce (2000) observes, the library catalogue is not necessarily a single Z39.50 target, since most library systems support a logical data model consisting of at least three separate targets: a bibliographic database, an authority database and a holdings database. There are plans for the Bath Profile to include a functional area for thesauri in a future version.

Currently it is being implemented by SIRSI; one may anticipate that conformity with the Bath Profile will increasingly become an issue for vendors. (Lunau 2000, Miller 1999). Explain is currently implemented within Z’mbol, a metadata indexing system developed by Fretwell-Downing.


The development of the Web provided a boost to Z39.50, since it provided a forms-based interface for Z39.50 searches (Casale 1996). Most library system vendors have implemented Z39.50 and have added features, such as the ability to execute multiple simultaneous searches. This is done either via a local Z39.50 client or (more frequently nowadays) via a combined Z39.50 client and web browser, which offers access to Z39.50 via the browser interface, performing an interconversion between Z39.50 and HTTP (Turner 1998).

If libraries have a Z39.50 server, their holdings are searchable from external Z39.50 clients. (Z39.50 software cannot be customised to interact with a library’s own integrated system; it sits waiting for search requests from outside users (Nickerson 1998)).


Web OPAC software is technologically very complex, as it must incorporate ways of overcoming the inherent statelessness of the TCP/IP protocol (Rhyno 1997b). Web OPACs can usually be configured to search any number of vendors’ systems. The use of Z39.50 Extended Services in Web OPACs has been variable to date. Web OPAC features provided by a vendor may or may not use them, which can lead to interoperability problems. Hinnebusch (1997) suggests that take-up of Z39.50-enabled document supply facilities has been limited owing to its administrative complexity.
Z39.50 in its classic form is unlikely to be taken up widely outside the library and information field; it is unpopular with the wider Web community on account of its complexity, use of connection-based sessions, use of binary encoding, and direct transmission via TCP/IP, while other Web standards duplicate aspects of its functionality. (LeVan 2002). It has not been implemented by major browser or relational database management system vendors, but has shown steady growth and evolution within the sphere of library applications, and has a large installed base in existing systems. It is still the only effective means of enabling simultaneous queries upon distributed heterogeneous databases. It is reasonable, therefore, to anticipate for it a continuing importance in library systems within the near future (Needleman 2000; Moen 2001).
The question arises as to the future of Z39.50 in an XML-dominated era within the context of the Web (see the discussion of XML and its significance). Z39.50 is comparable to XML in that it provides an abstract framework for talking about data models completely independently of the physical software and underlying architecture. It goes further, however, than XML in that it specifies not only a logical representation of a document, but how it may be searched (Hammer 2000). Jørgensen (2000) points out a number of synergies between Z39.50, XML and RDF: Z39.50 can support XML as a transfer syntax, it can support XML-based query languages, and can search and retrieve RDF structures, while Extended Service transactions can be handled by the Simple Object Access Protocol (SOAP)9.
In recent years a number of development projects have sought to redevelop Z39.50 as a Web service. The Z39.50 Implementors Group (ZIG) is sponsoring the development of Z39.50 in this way via its Search and Retrieve on the Web (SRW) initiative, formerly known as ZNG. This work uses a version of Z39.50 encoded in XML and sent via HTTP and SOAP. The details of the system architecture are described by Jørgensen (2001). It has the advantage that the Open Archives Initiative community is already committed to the use of services running over HTTP; also SRW is considerably simpler than “classic” Z39.50 (ZING 2002; NISO 2002). Another project is that of Corfield et al. (2002) on the JAFER (Java Access For Electronic Resources) ToolKit, which is developing a simplified XML-based API above the Z39.50 protocol for both Z39.50 clients and servers. The ToolKit has been used to build a number of Web applications based on XSLT, and also some experimental Web services.



Download 212.29 Kb.

Share with your friends:
1   2   3   4   5   6   7




The database is protected by copyright ©ininet.org 2024
send message

    Main page