Abstract 1 1 Introduction 2


Documents, Document Attributes and Document Spaces



Download 324.2 Kb.
Page2/20
Date10.08.2017
Size324.2 Kb.
#31133
1   2   3   4   5   6   7   8   9   ...   20

2Documents, Document Attributes and Document Spaces

This section discusses the various meanings of the terms used in this paper related to documents. These various definitions and explorations are undertaken with an eye to how they contribute to our understanding of a document space and, in turn, to the development of tools that might be used to represent that space, toward the goal of navigation. It begins with a definition of a document and document space. The time dimensions of documents are considered. The components and attributes of documents are discussed, relations between documents are examined, and the nature of document spaces is explored. Finally, a few of the major document-related systems are investigated.


2.1Documents


There are many definitions of a document. Efforts to define what a document is, and more generally, what information is, have been discussed in detail in Buckland (1991). He points out that definitions for a document have ranged from any text object to any informative thing, including living animals in a zoo. To narrow the scope of this study, a document definition given by Spring (1991) will be used. It is stated as follows;
A document is an identifiable entity, having some durable form, produced by a person or persons toward the goal of communication and may take a number of forms, but must have at least one symbolic manifestation that can be comprehended by humans." (p.8)
Documents are a combination of texts, graphics, and images. The development of multimedia also provides sound and animation components for documents. Documents may be produced on demand, based on what customers need and when they need it. Using computers, the contents of a document can be constructed based on a user’s request. Many news Web pages are “live documents,” i.e., the content of the document is always current. An active document is a document that searches for users instead of waiting for a user to find it.
From Spring’s definition above, a document has basic presentations, e.g. text form, which is general enough to convey communication. Other forms of presentation of a document will be investigated later.
In this paper, a document will be considered only in relation to its presentation in an electronic form. Documents are stored in semi-permanent or permanent forms, supported by a computer system. Documents may be referred by their surrogate and a point to physical documents. A document surrogate is a description of a document, including title, abstract, and etc.
Document Spaces

Benedikt (1992) investigated physical space to develop guidelines for designing cyberspaces, artificial spaces. He discussed space in terms of its topological properties, including dimensionality, continuity, limits, and density. From space properties, seven principles were proposed for designing a cyberspace, mainly concerning what it would look like and how it would be effectively presented. A space's dimensions may be described as extrinsic and intrinsic. An extrinsic dimension is a location of objects in space-time. An intrinsic dimension is a property of an object. A space may be bounded or unbounded, and discrete or continuous. In part this depends on the nature of the data type mapped to the spatial dimensions. Theoretically, some spaces have unbounded dimensions. For example, the dimension formed by an integer attribute, such as file size, has no upper limit. Practically, however, there are a finite number of documents in some scopes, a space can be considered as bounded at some values but still be extensible. Bounded space may have infinite resolution, i.e. there are infinite rational numbers between integers. The density of a space refers to how many objects and sub-spaces can be contained within the space. The density will be reflected in scale of space and movement through space.


Document space is used to refer to a collection of documents with some common attributes. It is possible that some attributes are specified only in some documents. In general, orthogonal attributes are used as dimensions. A space is defined by its dimensions. A space implies all possible objects in it with respect to the dimensions. In this view, a document space is not the same as a perceptual physical space. However, it can be projected so as to be presented in a perceived pseudo physical space.
Given a space, documents are objects within the space. (There are also other possibilities for transformation of a document mapped to a non-object, such as vector field or force, but these cases are more difficult to perceive and understand.) For the purposes of this discussion, document-objects are projected into some location in a space, based on their attribute values that conform to the dimensions of the space. The perception of a document object is controlled by space properties.
As a corollary of the definition of a space, it is useful to define the laws that apply to all objects in the space. In this paper, space is often defined in terms of the properties of the objects in the space. Objects may belong to a space if they contain an existence property. For instance, a query will result in the creation of a sub-space, and only documents that match a query belong in that sub-space. Other laws would include the notion that position and distance are created by a space itself, and that there is a Universe, the space that covers all spaces. General laws may be defined in the design of a space. In physical space, the laws of physics govern. For example, two objects cannot coexist at a given location; i.e. only one object can exist at single location. However, this and other laws may be relaxed in an artificial space.
Documents and Time

Many definitions for documents see them as fixed, meaningful marks in a media. They can be reproduced and provide communication over place and time. However, a document is also developed over a period of time. In this case, there are many versions of a document. Levy (1994) discusses the notion of fixedness and fluidity of a document. He addressed the notion that of fixedness is not absolute, as nothing is changed, but it is relative to its function and scope. Words in a language change their meanings over time; so does the meaning in a document even if all characters are the same. A reprint of a document may be viewed as being the same as the original, but for some scholars, it is different from the original. This is also true for “the lifetime” of a document, where short and long are relative. The lifetime of the document is understood in terms of its longevity and lifecycle. Further, the longevity of a document is situationally dependent. At the end of the document's lifetime, information in a document may carry no value. However, the value of information is situational; for example the value of a document may be zero in terms of its message, but it may have significant value as a historical document. Some types of documents, after their value is reduced, may be revalued by certain processing. For instance, laws and state regulations may be revised due to political changes. In addition to longevity, we may look at any document in terms of where it is in its lifecycle. The lifecycle of a document may be divided into phases of a document processing. Tasks and tools for each lifecycle phase are different and each phase takes a different amount of time, depending on the document type. For example, creation of a standards document takes a long time, and the process of creation requires many iterations and negotiations; document review tools and versioning capabilities are important.


In the time dimension, we can classify documents as having a short or a long lifetime, and as fixed or fluid.

  • Short life and fixed: memo

  • Short life and fluid: agenda

  • Long life and fixed: book

  • Long life and fluid: reference manual for a dynamic system

Another view of a document, over time, is the constancy of a document. A document may be constant or variable. A book is a classic example of a constant document. In contrast, a direct mail advertisement that puts receiver's names in the mail is variable. It is difficult to decide what the document is in this case. Is it the many instances with simple substitution, or is it the single originating document.


In looking at documents over time, it is necessary then to consider at least three additional attributes -- lifecycle position, location in the life (longevity), and the fluidity or constancy of the document.
Document Processing

Different document processes require different tools. This variation in requirement has obvious implications for navigation tools. For instance, navigation tools for the creation of documents will be very different from navigation tools for browsing document sets. A model for classifying document processing systems was proposed by Spring (1991). The model was viewed in three dimensions: document process, type of document and system component. The two major categories of document processes were “creating” and “accessing.” The taxonomy of detail processes is given in Table 1.


Table 1: Document Processing

Creating

Accessing

Creation

Writing


Outlining

Editing


Validating

Designing, illustrating

Proofing

Displaying



Storage

Acquiring

Selecting

Analyzing

Classifying

Indexing


Abstracting

Converting

Storing


Dissemination

Selecting

Reviewing

Editing


Formatting

Reproducing

Distributing

Archiving

Managing


Retrieval

Formulating queries

Searching derivatives

Searching documents

Searching collections

Selecting

Converting

Delivering

Using

Electronic document processing presents new opportunities for the processing of documents (Spring & Campbell, 1996). Creation process, access process and even the content of documents are changed when using a computer. Many functions are more convenient than in the paper-based process -- e.g. editing, printing, and retrieving a reference. The function of the author is changed from only writing content to formatting and publishing a document. New support processes for document access have been introduced, including repository of electronic documents, indexing, and distribution. For example, a digital library collects electronic documents for archiving. World Wide Web (WWW) searching engines provide an index for WWW pages.


Document type

Documents can be classified by type. Document types include fiction or non-fiction, book, text, periodical, journal, novel, news, etc. The content of each document type has some expected structure. For instance, a scientific paper is expected to be structured in the following order: an abstract, general discussion, experiment method, experiment result, discussion, and conclusion. Dillon (1994) has shown that users can predict the location of information in a journal with a high level of accuracy. The type of document is also differentiated by how it is read. A novel may be read only once but a textbook may be read repeatedly. The overall structure of a document collection of each document type is different. While a book may be considered by title and author, a newspaper may be addressed by date of printing.



Download 324.2 Kb.

Share with your friends:
1   2   3   4   5   6   7   8   9   ...   20




The database is protected by copyright ©ininet.org 2024
send message

    Main page