We go on to look at approaches to structuring data as a necessary precursor to knowledgeable derivation of information.
13.5.1Standard approaches to data and information management for individuals and small virtual teams
An earlier paper (Gregory and Norbis, 2008) presented the hypotheses that individuals working in groups should be encouraged and educated to make better use of the available tools for information management and that the tools themselves should evolve into (or be replaced by) better ways of representing information and knowledge. See also (Sauermann, Bernardi and Dengel, 2005).
Our earlier paper discussed the representation of personal data, suggesting that the ways in which data is stored on a computer influence how it can subsequently be used. We therefore identified several possible, or candidate, data representation approaches and analysed the consequences of choosing them. We suggested a classification scheme for tools based primarily on their data representation. We here reproduce and enhance the suggested categories.
13.5.2Spreadsheets
Spreadsheets consist of an array of cells, each of which can store a value or a formula. A formula relates the value of the current cell to other cells which can be considered as exporting their value to be used in the formula.
Spreadsheets are a very powerful combination of the nearest approach to widely available end-user computer programming so far invented; and ways of storing (more or less) structured data in which the relationship between items of data is imposed by the use of formulae.
The paper (Gregory and Norbis, 2008) introduced the idea of what it called a functional spreadsheet which simplifies and restricts the scope of spreadsheets to make them capable of formal representation. The idea was based on an insight documented in (Peyton Jones, Blackwell and Burnett, 2003). In the suggested functional spreadsheet approach, rows are hierarchically organized in an outline that groups and sub-groups the data. Cells are limited to contain only values (such as text labels, dates and numbers). Column and or row headers may contain the names of functions which may be applied either to all the values in a column or row; or to all the values in a group or sub-group defined as a hierarchical outline.
Databases generally have a more limited remit which they fulfil with greater precision than do spreadsheets. The most widely accepted, implemented and used type of database is the so-called “relational” database (Date 2003). He suggests as an informal initial definition that
“
A relational system is one in which the data is perceived by the user as tables (and nothing but tables); and the operators at the user’s disposal (e.g. for data retrieval) are operators that generate new tables from old. For example, there will be one operator to extract a subset of the rows of a table, and another to extract a subset of the columns – and of course a row subset and a column subset of a table can both be regarded as tables themselves. The reason such systems are called ‘relational’ is that the term ‘relation’ is essentially just a mathematical term for a table.
”
13.5.4Outliners
An outline is a hierarchical way to display related items of text to graphically depict their relationships. Outlining is a technique which may be implemented in general office programs or in specific computer programs known as “outliners”. An outliner is a program which stores and depicts outlines: a special text editor that allows text to be structured as an outline. Outliners are typically used for computer programming, collecting or organizing ideas, tasks or even project management. Outlining is the technique widely used in programs such as Microsoft Office PowerPoint, in which the main headings of a presentation appear as separate slides and on each slide appear points and sub-points. The same technique is available in a more powerful but perhaps less widely-used form in word processing packages such as Microsoft Office Word, which supports a very useful and underused Outline mode.
In an outline, a data item is given meaning by being shown in its owning hierarchy. Thus a person’s surname is a component of a composite Contact object. The relative positioning of an item conveys meaning in that the label of the owner classifies or otherwise gives contextual information concerning the owned item; and the depth in the hierarchy gives some idea of the relative importance or significance of the item.
Some programs allow a data item to participate in more than one hierarchy. Thus for example an appointment for a meeting can appear in an overall agenda or calendar, but also be linked to the name of each participant in the meeting. Effectively, the same datum is classified in more than one way. To the extent that knowledge is a product of the recognition by intelligent agents of connections between information otherwise not explicitly linked, this kind of tool can be used as a mechanism for storing relatively unsophisticated knowledge.
13.5.5Spreadsheets as a basis for databases
An exciting new commercial approach to the construction of database-enabled websites is the STOIC platform proposed by http://sutoiku.com/. Conversation with the founder of this company, Ismael Chang Ghalimi, suggests that reflection on personal information management was part of the motivation for STOIC. STOIC turns spreadsheets into complete applications, with a cloud-based relational database and a mobile user interface. An application is created as a new spreadsheet. An object is then a new worksheet. To add a field to an object, it is necessary to add a new column. To add a record to an object, simply add a new row.
13.5.6A possible classification of PIMS conceptual data structures
I suggest the following initial classification; this is partially corroborated by (Völkel and Haller, 2009):
Natural language and text Tables
Spreadsheets add functional programming capability to data tables
Hierarchical Outlines Relational databases Linking and multiple classification (Tagging) Graphs and graph theory Concept maps
Formally, concept maps are graphs; the objects are nodes and the relationships are edges. The topic is extensively discussed by (Friedman and Smiraglia, 2013).
Here there are a number of candidate data approaches, including categorical data analysis and clustering.
Specific PIM (Personal Information Management) programs Semantic web and web science
Other approaches include object oriented databases, XML documents, RDF and OWL
See (Davies, Studer and Warren, 2006).
Two possibilities exist when applying semantic web approaches to personal information: either specialist PIM software or services which incorporate semantic web techniques; or systems which apply semantic web techniques to pre-existing data stored on a specific computer. The latter approach is referred to as the semantic desktop (Sauermann, 2005; Sauermann, Bernardi and Dengel, 2005).
Enhancing the usability and usefulness of the Web and its interconnected resources might be achieved by:
-
Servers which expose existing data systems using the RDF and SPARQL standards. Many converters to RDF exist from different applications. Relational databases are an important source. The semantic web server attaches to the existing system without affecting its operation.
§14Documents “marked up” with semantic information (an extension of the HTML tags used in today’s Web pages to supply information for Web search engines using web crawlers). This could be machine-understandable information about the human-understandable content of the document (such as the creator, title, description, etc., of the document) or it could be purely metadata representing a set of facts (such as resources and services elsewhere in the site). (Note that anything that can be identified with a Uniform Resource Identifier (URI) can be described, so the semantic web can reason about animals, people, places, ideas, etc.) Semantic mark-up is often generated automatically, rather than manually.
§15Common metadata vocabularies (ontologies) and maps between vocabularies that allow document creators to know how to mark up their documents so that agents can use the information in the supplied metadata (so that Author in the sense of ‘the Author of the page’ won’t be confused with Author in the sense of a book that is the subject of a book review).
A very important issue: whose ontology?
If we accept the necessity for imposing some sort of classification mechanism to achieve accuracy and precision in searching for information, the next question which inevitably arises is “whose ontology shall we adopt?” We can identify three broad and overlapping alternatives:
Standardisation by committee (or by professional body, or by employer): top-down imposition
This is frequently done within communities of experts, such as pharmacists or medical practitioners.
Emergent ontology - ontologies shared between workers in small, often virtual, groups: bottom-up conceptualisation
This situation is common in areas of fast-changing technology or practice. A common vocabulary and classification system “emerges” and almost imposes itself. Evolution, when it occurs, is ad hoc.
Specialist programs which recognise or implement user-defined ontology
Share with your friends: |