Design of an Integrated Query and Manipulation Notation for Database Languages

Download 100.41 Kb.

Date	09.01.2017
Size	100.41 Kb.
	#8537

Design of an Integrated Query and Manipulation Notation for

Database Languages

Giorgio Ghelli¹ Renzo Orsini² Alvaro Pereira Paz³ Phil Trinder⁴

¹Università di Pisa ²Università di Salerno ³Laboratorio di Ricerca e Sviluppo ⁴Glasgow University

Abstract
A common task in data intensive applications is to locate a group of items and then manipulate them, e.g. to give a bonus to all of the employees in a department. Conventional relational query languages like SQL provide facilities for such manipulation-queries, but are not smoothly integrated with a programming language and lack computational power. It has been argued that omprehensions, a construct found in some languages, are a good query notation for database programming languages. An extension to comprehensions is proposed here that permits the data to be manipulated by side-effect in addition to being queried. These side-effecting queries are computationally powerful and smoothly integrated into a database programming language, unlike conventional relational query languages. Furthermore side-effecting comprehensions are concise and some may be automatically optimized.
_{1 Introduction}
_{Several Database Programming Languages are emerging that enable programmers to create data-intensive applications easily [Albano 85, Atkinson 82, Morrison 89, Schmidt 88]. These languages overcome the impedence mismatch between traditional DBMS and programming languages. The smooth integration of these languages with the data model abstraction mechanisms makes them suitable for applications of growing complexity. In addition to the programming language features these languages must provide the facilities offered by traditional systems, such as management of secondary memory, concurrency, recovery from failures, and query facilities.}

_{This paper is concerned with the provision of a query facility. Querying data is a common activity in data-intensive applications, as is manipulating the data located by the query. An example of such a manipulation-query is to give a bonus to all of the employees satisfying a given condition. A good query notation should be smoothly integrated in a programming language, and it should be clear, concise, optimizable and powerful. Conventional query languages, such as SQL, permit data manipulation within queries and are optimizable, but they are not smoothly integrated into a programming language, and in isolation lack the power to express recursion and computation.}

_{To the best of our knowledge DBPL [Schmidt 88] is the only database programming language that currently supports an optimizable query notation that permits manipulation queries to be written. However, the DBPL notation is defined on relations, and relatively few database programming languages support the relation as a bulk type. In contrast the notation we present can be defined over many bulk types, including lists, bags, sets and relations.}

_{It has been argued elsewhere that comprehensions, a construct found in some languages, are good query notation [Trinder 91]. Like many other notations, comprehensions can be smoothly integrated into database programming languages and allow queries to be expressed clearly, concisely and efficiently. Two advantages are claimed for comprehensions. The first is that, unlike conventional notations, comprehensions combine computational power with ease of optimization. That is, not only can comprehension queries express both recursion and computation, but equivalent comprehension transformations exist for all of the major conventional optimizations. The second advantage is that comprehensions provide a uniform notation for expressing and improving queries over many different bulk types, e.g. sets, bags and lists. However, the comprehensions described in [Trinder 89a], 89b, 91 are pure , i.e. while they permit the expression of queries, manipulation-queries cannot be written.}

_{We demonstrate here that comprehensions can be extended to incorporate data manipulation facilities. The advantages of this extension are (1) a single notation is used for both queries and manipulation-queries, resulting in a simpler language, (2) some manipulation-queries can be written in a clear and concise way, and some of them can be optimized, and (3) the notation permits manipulation queries to be defined that work uniformly over different bulk types, e.g. lists, bags, sets and some trees.}

_{The extension is achieved by adding one new construct to perform side-effects on the data located by the query. The known optimization techniques for comprehensions are modified in order to render the extended comprehensions optimizable even in the presence of these side-effects. We sketch the techniques needed to define a denotational semantics for the extended, side-effecting comprehensions. For concreteness side-effecting comprehensions are described in an object-oriented database programming language but they can be incorporated into any procedural database programming language. Indeed side-effecting comprehensions may prove useful as a general purpose construct in procedural languages.}
_{The paper is structured as follows. Section 2 contains a brief outline of the language where side-effecting comprehensions are embedded. Section 3 describes pure comprehension, i.e. comprehensions which do not perform side-effects. Section 4 describes side-effecting comprehensions. Section 5 outlines the semantics of side-effecting comprehensions. Section 6 concludes.}
2. Overview of the Language
This section contains a brief overview of the language where our comprehensions are embedded and presents the example that will be used through all the paper. This language is a statically strongly typed object-oriented database programming language with parametric and inclusion polimorphism, under definition at the Pisa University, stemming in the experience gained designing the persistent language Galileo [Albano 85]. Details about this language can be found in [Albano 90] and in [Albano 91].

The language supports the object-relationship data model, as defined in [Albano 91]. In the object-relationship data model real world entities are modeled by objects collected into classes, and associations between objects are not represented by the aggregation mechanism, but by n-ary relations (associations) relating objects in classes. Objects, classes and associations are first class values, and their types are first class types.

Objects are software entities with a local state and a set of operations called methods, to manipulate the state. Objects are values belonging to object types; an object type describes which methods can be applied to an object. Employee below is an example of an object type:
let Employee =Object

name: String

admYear: Int

salary: Int

department: String

yearsOfService:Int

giveBonus(b:Int):null

end
If emp is an object of type Employee, he can be operated as follows:
emp.name returns the name of emp

emp.giveBonus(100) gives a bonus to emp

Sequences are homogeneous ordered multisets, and are applicative data structures, in the sense that they are not modifiable. Classes are subtypes of sequences, with side-effecting operations, used in database applications. Because of the subtype relation between classes and sequences, classes can be seen as sequences and thus they can be queried and manipulated with the operators defined for sequences. Associations are essentially sets of records. The components of these records can be of any type, although generally they are object types, representing the associated entities, with possibly some attributes of the association.

We now present a simple example of these mechanisms, which will be used in the rest of the paper. We define a class employees of objects of type Employee and an association manages between employees and employees. An employee has a name, an admittance year, a salary and a department name where he works. There is a method yearsOfService that returns the number of years an employee has been working for the company. There is also a method for giving a bonus to an employee. For example, the following code creates an empty class of employees and an empty association between employees and managers, where both the employees and the managers are elements of the class employees, and where for each managed employee there exists at most one manager.

let employees= new (classOf Employee )

let manages =

new (assocOf

theManager: Employee in employees

theManaged: Employee in employees

key(theManaged))
To represent the fact that m is the manager of the two employees e1 and e2, two records {theManager=m, theManaged=e1} and {theManager=m, theManaged=e2} are inserted into the association.
3 Pure Comprehensions
3.1 The notation
_{Comprehensions are a language construct inspired by the familiar Zermelo-Fraenkel set comprehension. For example, the following set comprehension describes the set of all the numbers which are the difference of two prime numbers:}
{ x-y | xN, yN, x prime, y prime, x>y}
A comprehension is defined by a set of of generators (xN) and filters (x>y) and by a result expression (x-y). In our syntax, generators are written as “for x in seqExpr”, filters are written as “where boolExpr” and the result expression is written as “seq expr₁,…,expr_n end” and closes the comprehension. Different bulk types can be indicated by a different key word, for example set in place of seq. For simplicity the remainder of the paper only deals with list comprehensions. The names of the elements of the employees class with a salary greater than 5000 can be obtained using the following comprehension:
for e in employees

where e.salary>5000

seq e.Name end
A generator for x in seq, where seq is a sequence of n elements, requires that the remaining part of the comprehension be executed n times, binding x to the n different elements of seq. A filter where cond requires the evaluation of the remaining part of the comprehension only if cond is true. Finally the result builder seq expr₁,…, expr_n end evaluates expr₁,…, expr_n and adds them to the sequence which is built by the comprehension. In the example above, e.name is evaluated once for each e in employees whose salary is greater then 5000.

From_{the definition above, it follows that having more generators is similar to take a cartesian product of the corresponding sequences, while having more filters is similar to take their intersection. For example, if}_x_and_y_{are defined as follows:}

let x:Seq (String)= seq "a" "b" "c" end

let y:Seq (String)= seq "d" "e" end
_{then a comprehension with two generators explore their cartesian product;}
for x in a for y in b seq concatenate(x,y) end
returns
seq "ad" "ae" "bd" "be" "cd" "ce" end.
_{But nested generators are more expressive than cartesian products since the sequence of the second generator, in general, depends on the identifier bound by the first one. For example, if managed(m) is a function which returns the sequence of the employees managed by m, the following comprehension returns all the pairs such that m manages e (i.e. the manages association), which is not a cartesian product}
for m in employee

for e in managed(m)

seq pair(m,e) end.
_{Generators and filters are collectively named qualifiers. The syntax of comprehensions is defined as follows:}
Comprehension = Qualifier {Qualifier}

seq Expression end

Qualifier = for Id in Expression |

where Expression
_{Comprehensions recall the relational calculus of tuples, but are different since the sequence and boolean expressions in the comprehension qualifiers can be any expression with the appropriate type, since generators substitute the unlimited quantifications of the relational calculus, and since comprehensions are not limited to operate on relations but can operate on many different data types, like sets, bags and lists. A complete description of comprehensions can be found in many functional programming texts including [Peyton-Jones 87].}
We give two final examples, illustrating the use of associations and recursive queries. The following query finds the names of the employees that earn more than their manager.
for m in Manages

where m.theManaged.salary>m.theManager.salary

seq m.theManaged.name end
Recursive queries can be expressed by mixing the query notation with recursive functions. As example, we show the following query involving a transitive closure, in which the add_to operator stands for a sequence constructor, that takes an element and a sequence and returns a sequence: “Find all the employees that are managed, directly or indirectly, by a given employee manager”.
let rec manages (manager: Employee): Seq(Employee) =

for empMan in manages

where empMan.theManager=manager

for managed in (add empMan.theManaged to manages(empMan.theManaged))

seq managed end
Note that we use the recursion of the language, the recursion is not part of the query notation. By expressing recursion in this way even recursive queries can be improved, as demonstrated in [Trinder 89a,Trinder 91].
_{3.2 Optimization of pure comprehensions}
The comprehension query notation is suitable for all the major optimization techniques used in relational systems, both those based on algebraic improvements and those applied at execution time [Trinder 89b]. Only a few of the most important improvements are outlined here. Performing selections as soon as possible is the most important query improvement. For example a query like
for a in A for b in B where a.f=b.f where a.g=89 seq a end
can be transformed into
for a in A where a.g=89 for b in B where a.f=b.f seq a end
The filter is now applied first, reducing the size of the intermediate result giving an equivalent , but more efficient , query.
Products can be converted into joins. Two consecutive generators followed by an equality test as in
for a in A for b in B where a.f=b.g seq pair(a, b) end
can be transformed into a join, for example by sorting A and B on the fields f and g respectively and then merging them. This reduces the complexity of the query from O(n²) to O(nlog n).
Indices can be used. For example, if there is an index defined on field f of A, then a query like
for a in A where a.f=val seq a end
can be transformed to take advantage of the index to evaluate the query efficiently.
The most important and most often used comprehension transformation is Qualifier Interchange. Qualifier Interchange states that any two juxtaposed qualifiers q and q' can be swapped if they do not refer to variables bound in each other. For example, the qualifiers ‘for x in X’ and ‘test(x)’ cannot be swapped. Qualifier Interchange allows the order of filtration to be changed as well as the order of generation.

To allow applying this kind of optimizations to the queries, we define the semantics of the iterator as non deterministic. More precisely, we do not define in which order the n values generated by a for qualifier are bound to the relative variable in the n evaluations of the remainder of the comprehension. This non-determinism allows performing, for example, the following optimizations:

a) swapping qualifiers applying the qualifier interchange rule.

b) generating the elements of a stored sequence using an index, even if this changes the order in which elements are retrieved.

So, for instance:
for x in seq 1 2 end seq x end
can return
seq 2 1 end or seq 1 2 end.
The programmer can, however, call for deterministic semantics, when needed, by prefixing the whole expression with ord; the ord prefix forbids any optimizing transformations. For example
ord for x in seq 1 2 end seq x end
always returns seq 1 2 end.
An example where the order is important is the following, where a list of employee names with salary greater than 500 is required, ordered by admittance year; if the employees class is sorted by admittance year, we may write:
ord for e in employees where e.salary>500 seq e.name end.
4 Comprehensions and Side-effects
4.1 Motivation
In database applications there is often the need to update a set of objects with certain properties. If we use the same notation to retrieve the objects and to perform the updates, we obtain several benefits:

– a concise and “declarative-style” notation

– an unified handling of queries and updates

– the possibility of using certain optimizations.

Consider, for instance, the following task. We want to give a bonus of $50 to every employee of the Silverware department.
for e in employees

where e.department="Silverware"

seq e.giveBonus(50) end
This expression is simple and can be improved with the same techniques discussed above. Notice, however, that the result is a useless sequence of null values, one for each updated employee; this deficiency is remedied in the next Subsection.

In the preceding example the update is performed in the final expression, but there are cases in which it is useful to perform an update in the body of the comprehension. For example, if we want to give a bonus to the employees of the Silverware department and collect the names of people whose salary was increased. Apparently, the following single iteration would do this job:

for e in employees

where e.department ="Silverware"

where (e.giveBonus(50); true)

seq e.name end
First, note that we must use a compound filter (e.giveBonus(50);true) to meet the requirement of returning a boolean in a filter, although we are only interested in performing the side effect. Second, this query is incorrect because of the semantics of the iterator, which allows the optimizer to swap the two filters, producing a complete different result, i.e, giving a bonus to all the employees and returning the names of employees of the Silverware department. This could be corrected by prefixing the expression with ord, as this forbids the interchange of qualifiers. This solution is however not satisfactory, because it permits no improvement at all, precluding, for example, the use of an index on department to locate the desired employees efficiently. These considerations motivate our proposal.
4.2 Do qualifiers
In traditional query languages the absence of side effects permits a large number of improving transformations on queries, that only modify the order in which the result is delivered. For example, the already mentioned Qualifier Interchange swapping qualifiers when the second does not refer to the variable bound by the first one, can always be applied in a pure comprehension.

Clearly, this is not true in a side-effecting environment. For example, the swapping of two filters could alter the result of the query if one of them performs a side effect that is observed by the other. Moreover, the swapping of a filter and a side-effecting expression may alter the set of objects on which the side effect is performed, as was shown in the preceding example. Nevertheless some optimizations could be equally performed; for example the qualifiers to the left of a side-effecting qualifier may be interchanged, as can the qualifiers to the right, while the side-effecting qualifier cannot be interchanged with another.

In order to give a definition of the semantic of comprehensions which at the same time allows optimizations but forbids rewritings which would produce unexpected results, as could easily happen with side-effecting comprehension, we propose to distinguish side-effecting qualifiers from the other ones. In our proposal side-effects are included in comprehensions using a new qualifier, do expression. The expression is evaluated solely for its side-effects, i.e. its value is discarded. Using do the previous example can be written:
for e in employees

where e.department = "Silverware"

do e.give_bonus(50)

seq e.name end
This way of reporting on the modified data is natural and scans the class only once.

The semantics of these extended comprehensions specifies that the pure filters and generators which precede a do qualifier produce their bindings in any order, and then the do expression and the remaining part of the comprehension are executed once for each generated binding, as usual. So the do qualifier cannot be swapped with any other qualifier; however, as the order in which sequences are visited is not specified, “for e in employees where e.department = "Silverware"” can be converted into an index lookup. Hence when the order of the result is not significant even side-effecting queries can be improved.

4.3 Restrictions
Side-effects are not permitted in generators or filters, as they do not seem to be very useful, and their use often produces code with an obscure or non-deterministic semantics. This decision keeps the query notation simple, compared with the alternative approach of defining special side-effecting generators and filters, and also permits a clean understanding of the impact of side-effects on optimization, because side effects can appear in just one place. We demonstrate in the following paragraphs that no expressive power is lost by restricting side-effects to do qualifiers.

In a language with static binding (side-)effect declaration and checking [Gifford 86] could be used to restrict side-effects to do qualifiers, at some cost. Unfortunately the late binding used in object-oriented languages makes effect checking impossible, because a different method may be bound at execution-time from the method present at compile-time. In languages with dynamic binding we must rely on the programmer only using side-effects in do qualifiers.

Side-effecting filters could be useful when side-effects produce values which should be observed. For example, suppose that the giveBonus method returns the new salary; if side-effecting filters were allowed we might write the following comprehension to name the employees that earn more than 5000 after receiving a bonus:
for e in employees

where e.giveBonus(500)>5000

seq e.name end
This situation can be solved either using assignments or just retrieving another time the value returned by the side-effect, like done below:
let salary = 0;

for e in employees do salary<-e.giveBonus(500) where salary>5000 seq e.name.
for e in employees do e.giveBonus(500) where e.sal>5000 seq e.name.
So side-effecting filters can be avoided. Side-effecting generators are even more unlikely, and can still be emulated with the do qualifier and assignment. The absence of side effects on filter and generators is not enforced by the compiler, but the semantics of comprehensions with side-effecting generators or filters is not defined.
4.4 Proposal
Finally, our proposal also makes the construction of the result list optional. So the query
_{for e in Employees}

_{where e.Department ="Silverware"}

_{seq e.GiveBonus(50) end}
producing a useless list of null values can be rewritten as
for e in Employees

where e.Department="Silverware"

do e.GiveBonus(50) end
The syntax of side-effecting comprehensions is defined as follows:
_{Comprehension = [ord] Qualifier {Qualifier}}

_{[seq Expression ] end}

_{Qualifier = for Id in Expression |}

_{where Expression |}

do Expression
It appears that many useful operations can be written using side-effecting comprehensions. However it is also possible to write a comprehension that modifies, inside a do, the data it iterates over and hence has extremely obscure semantics.

There are side-effecting comprehensions in which the order in which the side effects are performed is significant. The result of a query with this characteristic may be affected by an improvement. For example, if a programmer knows that the employees class is sorted by admission year, with recently hired employees appearing first, he might write the following comprehension to ascertain whether there are any long-serving employees, and discover their names:

let var longserving:bool=false.

for e in employees

where e.department="Silverware"

do if e.admYear<1970 then longserving=true

where longserving

seq e.name end.
If there is an index on department that does not preserve the admission order then the improved query will locate the first long-serving employee, if any, and then return the names of all the employees in the index. These kind of queries are dependent on the order, and consequently the programmer must use ord to identify them.

However, there is a wide class of side-effecting queries where the iteration order is not important, and so the final state of the database and the result of these queries will be the same regardless of the order in which the side-effects are performed. For example, the expression given above to raise the salary of all the employees falls in this class.

5 Denotational Semantics of Comprehensions
5.1 Introduction
Our comprehension mechanism has a non-deterministic semantics, since sequences are explored in an unspecified order. It has also a partial semantics, since the semantics of comprehensions which execute side-effects inside for or when qualifiers is undefined. This partial non deterministic semantics has been preferred to deterministic semantics since it allows implementing comprehensions exploiting algebraic optimizations and associative access structures.

This non deterministic semantics is satisfied by any implementation which exhibits “no more non-determinism”. More specifically, the non-deterministic semantics specifies a set of possible results for any expression, and an implementation satisfies this semantics if the result of evaluating an expression always belongs to its semantics. Observe that even deterministic implementations may satisfy this non-deterministic semantics.

As far as “undefined” expressions are concerned, an implementation can return any result in this case; we could have required that an exception would be raised in these cases, but we preferred not to constrain implementations.

The informal explanation of comprehension semantics we have given in the paper allows writing queries using this notation, but is not formal enough to decide exactly when an implementation correctly implements our mechanism, and nor to prove whether a given algebraic optimization or a given way of exploiting associative access structures always produces acceptable results. To meet these aims, we need a formal semantics.

A formal semantics should satisfy the following requirements:

• It should be “liberal”. Comprehension semantics should allow algebraic and run-time optimizations, even if they change the exact semantics of comprehensions which perform side effects. The comprehension semantics should impose only some “constraints” on the behavior of programs, weak enough to allow optimizations, but strong enough to write useful programs and to reason about them.

• It should be “simple”. Actually it is not possible, at the current state of the art, to give a semantic interpretation of comprehensions which is easy to read and understand, since the mechanism itself is too complex. But it should be possible, at least, to explain in an understandable way the basic ideas; comprehensions can be used only if the meaning of a comprehension expression can be understood by the programmer. In any case, the semantics should not make explicit reference to specific optimization techniques, but it should be possible to describe it in an abstract way.
Defining such a denotational semantics is difficult since it should exhibit a combination of non-determinism, partiality, side-effects and exception handling, and each of these features adds complexity to the interpretation domain of expressions and to the interpretation of operators. For this reason, we will adopt an incremental approach, like the one advocated in [Moggi 90].
The section is structured as follows. In section 5.2 we introduce the modular approach to denotational semantics which should be adopted to describe the semantics of comprehensions. In section 5.3 we specify the domain used to interpret side-effecting comprehensions. In section 5.4 we give a succint picture of the full semantics of side-effecting comprehensions.
5.2 Modular denotational semantics
An incremental approach to the process of giving a denotational semantics for a typed language can be described as follows. First an interpretation is defined for the language of types, interpreted as structured sets^¹. Then, if we are interpreting a pure functional language, each open term a of type A, with free variables x₁,…,x_nwith types A₁,…,A_nis interpreted as a morphism with the following “type”. In the following [A] is the interpretation of a type A, and [a] is the interpretation of a value a):
[a]: [A₁]X…X[A_n][A]
The untyped case is just a special case of the typed one, with only one type. A tuple in [A₁]X…X[A_n] is called an environment; environments are denoted below with the letter r.

If the language interpreted is not pure, the structure used to interpreted a term must be richer. For example, we can use the following structures to interpret languages with non-determinism, with side effects, with exceptions, or with both exception and side effects:

[a]: [A₁]X…X[A_n]√[A] non determinism

[a]: [A₁]X…X[A_n](Store  ([A] X Store)) side-effects

[a]: [A₁]X…X[A_n](Exc + [A])) exceptions

[a]: [A₁]X…X[A_n](StoreExc + ([A] X Store)) side-effects and exceptions

In general, a term like a above is interpreted in [A₁]X…X[A_n]T[A], where T[A] is a structure richer than [A] where computations yielding values of type [A] can be interpreted; we note below that the T function is a structure. For example, if _ is the interpretation of a type then .Exc+_is the structure used to interpret exceptions. The last example shows that T can be also a kind of composition of more structures, one which enriches the basic structure with what is required to model side-effects, and the other one which adds the structure required to model exceptions. To formalize this point of view we define, for each feature f of a language, a structure constructor F_fwhich transforms the structure T representing a structure to interpret a language without the feature f into a structure F_fT where f can be interpreted^² . In this setting we can define the following structure constructors:
• Side-effects: if T is a structure and _ is the interpretation of a type, a new structure where side-effect can be interpreted is defined as:

Seff_Store(T)_ = (T(_X Store))^Store

• Exceptions: Exc_Exc(T)_ = T(Exc+_)
By isolating the structure constructors which compose the domain where computations are interpreted, it should be possible to describe the semantics of some operations using only a part of this global structure, and also studying the properties of programs which use only, e.g., side-effects, ignoring the other features of the language.

Given an operation defined on a structure T_ there is a natural way to lift it to FT_, under some reasonable conditions on the type of the operation and on the nature of F and T. For example, if F=Div, then operations are extended to strict operations. In the same way, if F=Exc, operations are extended so that it they propagate exceptions. If F=Seff, operations are extended to operations which do not change the state.

We will not follow this approach explicitly in the next sections, since we will give only some hints about the whole semantic definition, but this approach should be followed if the task were to be fulfilled completely.
5.3 Structure for side-effecting comprehensions
For any domain [A], the result of a non-deterministic and non-everywhere-defined computation of type A is, intuitively, either a set of values of [A], i.e. an element of √([A]), or the special value  expressing indeterminacy. So the basic structure to interpret comprehensions is given by the following function:
ND = _.+√_
We now lift this basic structure with structure constructors for side-effects, and a modifiable flag remembering whether an expression performed a side-effect. The resulting structure is computed below:
ND = _.+√_

Seff_Store(Exc(ND)) =_.(+√(_X Store))^Store

Seff_FlagSeff_Store(Exc(ND)) =_.(+√(_XStoreXFlag))^Store^Flag
This means that an expression a having type A in a context x₁:A₁,…,x_n:A_n is interpreted as a function of type:
[a]: [A₁]X…X[A_n]  Flag  (Store  (+√([A] X Store X Flag)))
To embed comprehension into the full language, this structure should be enriched to deal with exceptions, and the semantics given in the following section should be lifted to propagate exceptions.
5.4 Interpretating side-effecting comprehensions
5.4.1 Deterministic pure comprehensions

In this section we give the deterministic semantics of ordered comprehensions in the trivial structure l_._. This semantics is given inductively on the number of qualifiers of the comprehension.

The interpretation of a comprehension without qualifiers is given supposing that for any domain A a domain List A for finite sequences of elements of A is fixed, equipped with two operators cons and emptylist with the following types:
cons_A: A X List A  List A

emptylist_A: List A

To simplify the notation we will ignore the A index of these operators. In these context, a comprehension without qualifiers is interpreted inductively as:
[seq end]r = emptylist

[seq E₁,…,E_n end]r = cons [E₁]r [seq E₂,…,E_n end]r

The when qualifier is not seen as primitive but is translated in terms of for as follows:
[when BoolExpr Compr] = [ for _ in

(if BoolExpr then [unit] else []) Compr]

_ is a fresh variable not appearing free in Compr; unit is the only value of the trivial type Unit; [] is an abbreviated notation for seq end, so that [unit] is a one element list while [] is an empty list. The second comprehension evaluates Compr once, binding _ to unit if BoolExpr is true, and returns an empty list otherwise. This is exactly the behaviour of the first comprehension expressed using when; so we can see when as a degenerate case of for.
Finally, a for x in [a₁,…,a_n] tailCompr comprehension evaluates tailCompr n times, binding x to a₁,…, a_n, and then returns the concatenation of the results of the n runs of tailCompr. This is expressed below, using the auxiliary functions map and flatten::
map_A,B: (AB)  List A  List B

flatten_A: List List A  List A

[for x in E Compr]r = flatten_B map_{A,List B} (x. [Compr]r[x/x]) [E]r
“map (x. [Compr]r[a/x]) [E]r” applies the function x.[Compr]r[x/x] to all the elements of the sequence [E]r, i.e. it evaluates [Compr] in an environment r[x/x] where x is bound to x for any element x of [E]r. Since each [Compr] returns a sequence, this mapping operation returns a sequence of sequences, which is flattened to a sequence by the list concatenation operator flatten. This completes the plan for the semantic definition of pure comprehensions; in this side-effect free context, a do expr qualifier is equivalent to a useless when (expr;true ) qualifier.
5.4.2 Non-deterministic comprehensions with side effects

We will first assume that the expression inside a for or a when qualifier does not perform side-effects, and will relax this assumption only in a second phase. We first lift the map function to a function in the structure Seff ND, where Seff ND A = Store€(√(A˝Store)); intuitively, a computation of type Seff ND A transforms a store into a set of pairs value-new store. The lifted map has the following type:

map: (ASeff ND B)  Seff ND (List A)  Seff ND (List B)
map f l is the natural lifting of map to the structure above: it first chooses non-deterministically one list in l and then applies the non-deterministic computation f to all the elements of the list, updating the store each time f is evaluated; map returns the collection of all the possible results of this non-deterministic execution:

Now we can interpret comprehensions; we observe that any comprehension is structured either as seq … end or as a sequence (maybe empty) of for qualifiers followed by a do qualifier, followed by another comprehension of one of the two preceding classes.

A seq … end comprehension just returns the singleton containing only the semantics of the corresponding deterministic execution.

In the other case, all of the for qualifiers are interpreted at once, building a list of binding for their variables. Then one permutation of this list is used, non-deterministically, to evaluate the remaining part of the comprehension:

perm_A: Seff ND (List A)  Seff ND (List A)
[for x1 in E1 … for xn in En Compr]r =

flatten_B (map_A₁_X…_X_A_n,List B

(lx₁…x_n. [Compr]r[x₁ … x_n/x₁ … x_n])

(perm[for x₁ in E₁ … for x_n in E_n seq 1 … x_n> end]r))

In the expression above, perm is (informally) an operator that receives a set of lists and returns the set of all the permutations of those lists.
A do qualifier simply modifies the state used to interpret the comprehension which follows.
[do E Compr]rs = let _,s' =[E]rs in [Compr]rs'
It could be possible to interpret non-deterministic comprehensions with side effects without grouping the consecutive for operators, like we did in section 5.4.1. But, while in the pure case the defined function is exactly the same, when side effects come into play the distinction is important. We do not enter into greater detail; merely remarking that the distinction between flat and nested interpretation of for exemplifies the need for a formal management of semantics, and exemplifies the kind of hidden details which can be overlooked when a language mixing side effects and non-determinism is described and implemented.

Finally, we hint at the formal treatment of the condition specifying that the semantic of a comprehension is undefined if a side effect is performed by a for (or a when) qualifier. In this case, we define a richer structure constructor Seff_FlagSeff_StoreND defined as

Seff_FlagSeff_StoreND _ = FlagStore(+√((_ X Store X Flag))
The constructor Seff_Flagenriches the semantic domain with a one-location store Flag, containing just one boolean cell flag, which is set true by any side-effecting operation. In the full semantics of comprehension, the generators are evaluated in a store where flag is false, and if it becomes true, the semantics of the whole comprehension becomes undefined. This semantics is completed by strictness rules specifying that the undefined value  is transmitted by all the semantic operators.
6 Conclusions
We have presented a query and manipulation notation based on the comprehension notation. The main features of this notation are:
—It is concise and clear

—It is a uniform notation allowing expressing both queries and data manipulation

—Queries can be optimized even in presence of side-effects
We have suggested a path to define a denotational semantics for this notation. The results of this research will be used to define a query notation for a strongly typed object oriented database programming language which is being defined at the University of Pisa.

This research should be completed by formal, linguistic and implementative studies. In ths first field, we should define a detailed semantics for comprehensions, and prove that the known optimization techniques transform a query into another one with a compatible semantics. From a linguistic point of view, the basic mechanism should be completed with with some sugar to deal in an easy way with associations as they are represented in the object-relationship model. Finally we should study an implementative model for this notation.

Acknowledgements
This reasearch has been partially supported by E.E.C. Esprit Basic Research Action 3070 FIDE, the Italian “Ministero dell’Università e della Ricerca Scientifica e Tecnologica”, by Italian C.N.R., Progetto Finalizzato Sistemi Informatici e Calcolo Parallelo and also by the British SERC Bulk Type Constructors Project.

_References
[Albano 85]

A. Albano, L. Cardelli and R. Orsini, “Galileo: A Strongly Typed, Interactive Conceptual Language”, ACM Trans. on Database Systems 10, 2, pp. 230-260, 1985.

[Albano 90]

A. Albano, G. Ghelli and R. Orsini, “Objects and Classes for a Database Programming Language”, rapporto tecnico Progetto Finalizzato Sisitemi Informatici e Calcolo Parallelo 5/24, oct. 1990.

[Albano 91]

A. Albano A., G. Ghelli and R. Orsini, “A Relationship Mechanism for a Strongly Typed Object-Oriented Database Programming Language”, to appear in Proc of 17th International Conference on Very Large Data Bases, Barcelona, 1991.

[Atkinson 82]]

M.P. Atkinson, K.J. Chisholm, W.P. Cockshott, "PS-Algol: An Algol with a Persistent Heap" Sigplan 7,17, pp24-31, 1982.

[Gifford 86]

D.K. Gifford, J.M. Lucassen, "Integrating Functional and Imperative Programming" Proceedings of the ACM Conference on Lisp and Functional Programming, pp 28-38, 1986.

[Moggi 90]

E. Moggi, “An Abstract View of Programming Languages”, Tech. Rep. ECS-LFCS-90-113, Dept. of Comp. Science, University of Edinburgh, 1990.

[Morrison 89]

R. Morrison, A.L. Brown, R.C.H. Connor, A.Dearle. "Napier 88 reference manual”, Technical Report, Department of Computational Science, University of St. Andrews, 1989.

[Peyton-Jones 87]

S.L. Peyton-Jones, “The Implementation of Functional Programming Languages”, Prentice Hall, 1987.

[Schmidt 88]

J.W. Schmidt, H. Eckhardt, F. Matthes. "DBPL Report”, DBPL-Memo 111–88, FachBereich Informatik, Johann Wolfgang Goethe-Universitat, Frankfurt, West Germany, 1988.

[Trinder 89a]

P.W.Trinder, “A Functional Database”, D.Phil. Thesis, Oxford University, 1989.

[Trinder 89b]

P.W. Trinder, P.L. Wadler, “Improving List Comprehension Database Queries”, Proceedings of TENCON’89, Bombay, India, pp.186-192, 1989.

[Trinder 91]

P.W. Trinder "Comprehensions, a Query Notation for DBPLs" Proceedings of the 3rd International Workshop on Database Programming Languages, Nafplion, Greece, 1991.

1actually, as objects in a category

2 actually, T should be a monad and F a monad constructor (see [Moggi 90])

Directory: ~trinder -> papers
papers -> Pamguard: Semiautomated, Open Source Software for Real-Time Acoustic Detection and Localisation of Cetaceans

Download 100.41 Kb.

Share with your friends: