CS2130 Programming Language Concepts Unit 9 Values, Types and Variables



Download 67.59 Kb.
Date05.08.2017
Size67.59 Kb.
#26687
CS2130 Programming Language Concepts
Unit 9 -- Values, Types and Variables

The term value refers to anything that results when an expression is evaluated, or an entity that may be stored in a memory cell within the computer, or which forms part of a larger composite data structure such as an array or record, or which may be passed as a parameter to a procedure or function, or returned as a function value.


Variables

A program variable is an abstraction of a computer memory cell (or a collection of memory cells). Programmers often think of variables as names for memory cells, but this is an over-simplification. A variable can be characterised by six attributes:


name address or reference value

type lifetime scope


Our discussion of these topics will also lead us to the related concepts of
aliasing binding binding time

type checking strong typing type coercion

scope rules referencing environments
The name of a variable was discussed extensively in unit 2. The address or reference to a variable specifies the location of the storage associated with that variable, that is the place where its value will be stored. The relation of the name, reference and value can be visualised as schematically as

The address or reference associated with a variable is often referred to as its l-value or left-value, and then for clarity its normal value is referred to as its r-value or right-value. The motivation for this terminology is that in an assignment of the form:
A = B; (C, Java etc.) A := B; (Ada, Modula-2, Pascal etc.)
The reference associated with A is used to specify the target of the assignment, that it specifies the location of the memory cell in which the right-value of the variable associated with B is to be stored. Thus it is really only the l-value of A and the r-value of B which are relevant for the assignment. Of course the reference associated with B is used to access its right-value and the right-value (if any) associated with A before the assignment is irrelevant.
Types

In many languages we find different types of values. A type is defined as a set of values. When we say a value n is of type T, we mean simply that n  T. The type of a variable specifies the sort of values that can be associated with the variable and is used among other things to determine range of valid operations that can be performed on these values and the amount of storage that needs to be allocated to store such values. Not every collection of values is suitable to be regarded as a type; we expect that the values of a type exhibit some sort of uniformity and behave in a consistent way under the operations associated with the type.


We may identify several different sorts of type and value including:
primitive types (for example characters, truth values, enumerations, integers and real numbers). A value of a primitive type is one that is atomic, that is cannot be decomposed into simpler values.
composite types (for example arrays, records, strings, sets, lists and files)

A value of a composite type can be thought of as being composed of several values of a simpler type.


pointer or access types

values of such types are references to (that is addresses of) values of some other type


subprogram (procedure or function) types
Reference Semantics

Above we have described value semantics, that is the value of the variable is stored directly in the value cell. However many object-oriented languages use what is known as reference semantics where a reference cell contains a reference to the value rather than the value itself. The situation can be envisaged diagramatically as



Note that the situation is somewhat similar to that in Ada after the following declarations:
TYPE T_Ref IS ACCESS T; -- where T is some Ada type

Ptr : T_Ref := NEW T’(Value);


However in Ada two distinct types are involved; the reference in the value cell is directly accessible as the value of Ptr (of type T_Ref) and this must be de-referenced explicitly (using Ptr.ALL) to access the value of type T. In a language using reference semantics only the type T is involved, the reference in the reference cell is not directly accessible by the programmer; the reference is de-referenced automatically to access the current value of the variable Name (of type T).


Ada, Modula-2 and Pascal use value semantics for all the basic types and for arrays and records. C and C++ essentially use value semantics, but there are some peculiarities with the handling of arrays (whole array assignment is not allowed). Java uses value semantics for the basic types (int, float, double,char, byte, boolean etc.), but reference semantics for all objects (including strings and arrays).
In Java the declaration
int[] vector;
defines vector as a variable that can refer to any one-dimensional array of integers, but does not allocate any storage for the actual array elements.

Storage for the array elements is allocated by a call to new:
vector = new int[100];
Of course the storage can be (and usually is) allocated when vector is defined:
int[] vector = new int[100];
The situation can then be envisaged diagramatically as

Note the important difference between the behaviour of the following Ada and Java fragments:


Ada (value semantics)

TYPE IntArray IS ARRAY(1..100) OF Integer;

-- declare 2 array variables and allocate storage for their elements

A, B : IntArray;

.....

A := B; -- copies all 100 elements of B to A


The situation in Ada before and after assignment can be visualised as:



Java (reference semantics)
int[] a = new int[100];

int[] b = new int[100];

.....

a = b; // copies reference only


Thus diagramatically we have before the array assignment:


and after the array assignment:

Thus in Java after the assignment a and b refer to the same array and an assignment to (say) a[1] will also cause b[1] to change whereas in Ada, A and B are distinct arrays throughout and so assignment to A(1) never changes B(1).
Note that the use of reference semantics is somewhat more flexible; the size of an array associated with the array variable a can be changed. Given the above declarations we could subsequently write
a = new int[200];


whereupon a would refer to a newly allocated array of size 200 and would again be distinct from the array b. What is fixed however is the fact that a must refer to a one-dimensional array of int; it cannot subsequently become a 2-dimensional array or an array of (say) float. In Ada and many other imperative languages (Pascal, C etc.) once an array variable is declared its size and, of course, its element type are fixed permanently.
Type Checking

All operations should be checked for type consistency as this helps prevent programs performing meaningless operations, for example:


values of a type T should only be stored in variables able to accept such values

operators should only be applied to operands of appropriate types

subprograms should only be called with parameters of the appropriate types.
Rigorous type checking ensures the program is logically consistent and helps promote program correctness.
Static Typing

In a statically typed language every variable and parameter has a fixed type specified by the programmer. Thus the type of every expression can be deduced at compile time and so operations can be checked for type consistency during compilation. Most high-level compiled languages are statically typed.


If, in a statically typed language, the type of every entity used in programs in the language is determined sufficiently that all operations can be checked for type correctness at compile time, the language is said to be strongly typed. Ada was designed carefully to be strongly typed, but in some statically typed languages the type system is such that some expressions cannot be fully type checked at compile-time. Such languages are said to be weakly typed.
Notable examples of weakly typed languages are C and C++ and to a lesser extent Pascal. For example parameter type checking in function calls was not present in the original version of C. This was largely rectified in ANSI C (and C++) but calls to functions which take a variable number of parameters (such as the standard library functions for formatted I/O (scanf and printf) are not fully type checked. Also in C and C++ array parameters are not fully type checked. Java is more strongly typed than C or C++, but somewhat less strongly typed than Ada in respect of objects. For example in Java a variable of some class type T can hold references not only to objects of type T, but also references to objects of any type derived from T. The type of a value stored in such a variable cannot be determined fully until run-time.
Dynamic Typing

In a dynamically typed language, only values have fixed types. Variables and parameters have no specified type, and may take on values of different types at different times during program execution. This implies that an operation must be checked for type consistency at run-time immediately before the operation is performed. Prolog, SmallTalk, Lisp (usually) and most scripting languages are dynamically typed. Dynamic typing is more flexible as it allows the same variable to store different values and allows (for example) lists whose elements are of different types. There are however disadvantages:


late detection of type errors (at run-time)
significant overheads in execution time (and storage requirements) involved with run-time checking
Often in dynamically typed languages it is not necessary to declare variables before they are used; instead a variable is implicitly declared the first time it is used.
Each value must carry around a type tag to enable run-time checking. Typically the value cell of a dynamically variable contains a type tag and a pointer to the current value. Thus we can visualise the situation diagramatically as

Storage for the current value must be allocated dynamically as in general different types of value will occupy different amounts of storage. When a value of a different type is assigned to a variable, the value of the tag is changed and (in general) new storage is allocated to hold the current value and the reference to the current value is updated.



In some implementations to save storage, atomic values may be stored directly in the tag/value cell (if they take no more space than would a reference), by inspecting the tag the system can determine whether the value cell holds the value directly or whether it contains a reference to the actual value. The situation is then as indicated:

Example

RLisp is a dialect of Lisp: one can either use classical Lisp bracketed prefix notation or a syntax similar to a standard imperative language such as Pascal or Ada. In RLisp we could write the fragment


A := 1; B := 2; % declare (global) variables A, B implicitly

% and initialise them

C := A + B; % run-time check that operands of + are numbers

% declare (global) variable C implicitly

% A, B and C now all hold integer values
A := {1.2, "Alan", 2}; % A now holds a list of 3 values

B := First(A); % run-time check that A is a list

C := Second(A); % B now holds a real value and C a string

A := B + C; % run-time check detects a type error as we

% can't add a real value 1.2 to a string
Tags in Objected-oriented Languages

In object-oriented languages such as Java there is a (hidden) tag associated with objects of each class. This tag is inspected at run-time to enable dynamic dispatching in method calls:


class Parent { class Child extends Parent {

public int f() { public int f() { // override f

...... ......

} }


...... ......

} }
Parent a;

if(some-condition)

a = new Parent();

else

a = new Child();



int i = a.f();
In the step where the method f is called, the tag associated with the object a is inspected to see whether to call the method f of the Parent class (when a holds a reference to a Parent object) or to call the method f of the Child class (when a holds a reference to a Child object).
Ada also supports the object-oriented paradigm; in this language the equivalent of a Java class is actually called a tagged record type.
Type Equivalence

In some languages (for example C and C++) types are considered equivalent if their construction from primitive types is the same. We say that the language uses structural equivalence of types. For example in C and C++


typedef struct { typedef struct {

char name[20]; char name[20];

int age; int age;

float salary; float salary;

} person; } individual;
person a; individual b;
defines person and individual to be structure (or record) types. These clearly have the same structure and regarded as equivalent. We could pass b (of type individual) as a parameter to a function which expected a parameter of type person. Also the assignment a = b is valid. In fact the names individual and person are regarded as referring to the same underlying type.
Ada and Pascal use name equivalence: types are equivalent if and only if they have the same name. Each use of the type constructor produces a new type. Thus in Ada the types Heights and Weights below are regarded as distinct:
TYPE Heights IS ARRAY(1 .. 10) OF Float;

TYPE Weights IS ARRAY(1 .. 10) OF Float;

A : Weights;

B : Heights;

.........

B := A; -- !? illegal (type error)


In Ada a new anonymous type is created for each variable declared in
C, D : ARRAY(1 .. 10) OF Float;

.......


C := D; -- !? illegal (type error)
In fact there is no way directly to refer to the type of C or to declare another variable of the same type as C. Hence anonymous types1 should be avoided. In Pascal, C and D would have the same type (as they are declared in the same declaration), but this type would be distinct from a type of E declared separately as follows:
E : ARRAY(1 .. 10) OF Float;
Name equivalence offers better support for logical checking than structural equivalence; if types are given different names we would expect them to be used for logically distinct purposes. For example in an application we would normally not want to mix arrays of types Heights and Weights (assuming their names are meaningful) even though they are structurally equivalent.
Java uses name equivalence for classes. If we defined
class Person{ class Individual {

public char[] name; public char[] name;

public int age; public int age;

public float salary; public float salary;

} }
Person a = new Person(); Individual b = new Individual();
then object a of class Person and object b of class Individual though structurally the same are of different types and the assignment a = b; would cause a compilation error.
Java uses structural equivalence for arrays:
int[] a; // 1-D array

int [] b;

int [][] c; // 2-D array

int [][] d;

.......
a = b; // OK -- reference copied

c = d; // OK -- reference copied

a = c; // illegal

c = a; // illegal


Derived Types

In languages using name equivalence new types can be created based on existing types. For example in Ada this is done by using NEW:


TYPE File_Descriptor IS NEW Integer;

I : Integer;

FD : File_Descriptor;

............

FD := I; --!? illegal incompatible types in assignment
Such types are called derived types. The new type File_Descriptor inherits the operations of the parent type (Integer in this case), but has a set of values which is regarded as distinct from the parent type (though the sets are structurally equivalent). Using such derived types enables the compiler to detect some logical errors. In UNIX and other POSIX-compliant operating systems the open files of a process are identified by an integer file descriptor; such integers are logically distinct from normal integer values and type checking protects us from accidental confusion of the two.
Subtypes

Some languages notably Ada and Modula-2 support the notion of subtypes whereas others C, C++ and Java do not. A subtype is a base type together with a constraint; a value of the base type belongs to the subtype if it satisfies the constraint. For example in Ada


SUBTYPE Percent IS Integer RANGE 0 .. 100;

Mark : Percent;

I : Integer;

.........

I := Mark; -- always OK

........


Mark := I; -- Constraint_Error raised if value of I out of range

Get(Mark); -- run-time constraint check necessary


A subtype declaration does not introduce a new type and introduces no extra type checking requirements at compile time. However it often introduces the need for a run-time constraint check2. Note that the type of value determines what operations can be applied to the value and type consistency is checked at compile time, whereas the subtype of a value specifies the range of values of the type that are valid and in general requires a run-time constraint check.
The use of subtypes generally enables some logical errors in a program and/or errors in data files to be detected automatically at run-time (for example by generating constraint errors).
In Ada the number and type of the array indices are fixed as is the type or subtype of the array elements. In array types such as Weights and Heights above the index bounds are also fixed. Such array types are known as constrained. However it is possible to define unconstrained array types in which the index bounds are not fixed and then to introduce subtypes of the unconstrained type in which the index bounds are specified. For example the type String is an unconstrained array type pre-defined as
TYPE String IS ARRAY(Positive RANGE <>) OF Character;

........


SUBTYPE ShortStr IS String(1 .. 8);

SUBTYPE LongStr IS String(1 .. 100);

A : ShortStr;

B : LongStr;

C : String(1 .. 20); -- anonymous string subtype

.......


Ada.Text_IO.Put(A);

Ada.Text_IO.Put(B);


By defining the parameter of a subprogram to be of an unconstrained array type defined such subprograms can accept arrays of any size provided they are a subtype of this unconstrained type. Thus the procedures Put, Put_Line, Get and Get_Line from Ada.Text_IO can be used for I/O operations on strings of any length.
However in a language which uses value semantics we cannot define variables of an unconstrained type (as the size is not known), we can only declare variables of a constrained subtype of the basic unconstrained type.
As we have seen (for example in Java) the use of reference semantics allows array variables of undefined size to be declared and so is more flexible. Actual storage for the array elements is allocated only when an array object is created with new and then a size must be specified.
Type Conversions

Values of certain types can be converted to values of a related type; the most common example is conversion between numerical types where numerical values are preserved as far as possible3. For example in Ada:


N, Total: Integer;

X, Av : Float;

.........

X := Float(N) + X;

N := Integer(X);

Av := Float(Total)/Float(N);


Other languages notably C and Pascal allow implicit type conversions or coercions of numerical types in assignments and in arithmetic expressions involving operands of mixed type. For example in C and C++ we may write the above as
int n, total;

float x, av;

......

x = n + x; /* OK in C or Java */



n = x; /* OK in C, but an error in Java */

av = total/n; /* valid in C or Java, but logically incorrect */

av = (float)total/n; /* OK in C or Java */
Note that in the first command the integer value n is implicitly converted to float before the addition is performed. In arithmetic expressions with operands of mixed type the operand of the less general type is converted to the more general type.
least general ------> most general

byte4 short int long float double


Note also that the explicit conversion of total in the last line is necessary; without it integer division would be performed yielding a whole number value and then the result converted to float for the assignment as in the preceding line. By forcibly converting total to float the other operand of the division n is (implicitly) converted to float and then a floating point division is performed.
The recent trend in language design has been towards fewer implicit conversions, on the grounds of clarity and avoiding traps for the unwary. For example the (older) languages C and C++ allow implicit conversions in assignments from a less general to a more general type and vice-versa. However, the more recent language, Java, only permits conversions from a less general to a more general type (as no loss of information is involved in conversions in this direction). Thus in the above example the command n = x; is illegal in Java and would need to be written:
n = (int)x; /* explicit coercion OK in Java */
Ada is much more restrictive; implicit conversions are not allowed and programmers must always make their intentions clear by using an explicit type conversions.
Note that some type conversions (such as numerical conversions from integer to float) involve a change in machine representation of the value and so may involve considerable computations. However this is not always the case in examples such as:
TYPE File_Descriptor IS NEW Integer;

FD : File_Descriptor;

.......... -- so we can use Put from Ada.Integer_Text_IO;

Put(Integer(FD)); -- we convert FD to type Integer


Here there is no change in internal representation; the conversion is merely to satisfy the type checks in the compiler. Similarly in Ada a conversion of an array value of type Heights to one of type Weights would involve no run-time computations and would merely satisfy the demands of the compiler's strong type checking.
Casts

Type conversions (which convert a value in one type to a logically related value in another type) should be distinguished from casts which merely reinterpret the internal bit pattern of a value of one type as the value in another type. For example in Ada this is done by instantiating a suitable instance of the generic library function Unchecked_Conversion:


FUNCTION Cast_to_Float IS NEW Unchecked_Conversion(Integer, Float);

N : Integer := 1;

..........

Put(Cast_To_Float(N));

-- output bit pattern in N interpreted as a Float
Note in C the same effect can be achieved more simply as the arguments of printf are not type-checked:
int n = 1;

.....


printf("%f", n);

/* output bit pattern of n using float format %f */


whereas to print n in floating point format, we would convert the value of n to float format before passing to printf
printf("%f", (float)n); /* convert n to float and output */
Type Completeness

In virtually all languages values of a primitive type can be evaluated, assigned, passed to subprograms as parameters, returned as function values and compared for equality and inequality. In addition numerical values can be combined in arithmetic expressions and compared with operators such as ≤. Similarly boolean values can be combined with logical operators such as &&, || and !. Such values are referred to as first class values as all logically meaningful operations on values of these types are supported by the language. Records and arrays in Ada (and arrays and class instances in Java) can passed as parameters to procedures (methods in Java) and can be returned as function values and so are first class values.


However in some languages, notably C & C++, arrays cannot be returned as the value of a function, nor can they be assigned as a whole or compared as a whole5 . Structures (records) in C and C++ can be returned as the value of a function and can be assigned as a whole, but cannot be compared as a whole for equality and inequality. Thus in these languages arrays and records are not first class values.
In Pascal the value of a function must be a simple type; composite types such as arrays and records are not permitted. As the full range of meaningful operations on the type is not supported by the language, these values are referred to as second class values.
In many languages (Ada, C, Pascal, Java) procedures, methods and functions are not first class values as they cannot (for example) be assigned to variables or passed as parameters to procedures6 whereas in functional languages such as Miranda, functions are first class values.
Generally such restrictions on the operations allowed on a type are imposed to simplify the implementation of the language rather than for 'logical' reasons.
Type Completeness Principle

The type completeness principle states that no logically meaningful operation should be arbitrarily restricted in the types of the values involved. Thus C, C++ and Pascal can be seen to violate this principle in their treatment of arrays and records, but Ada and Java conform to the principle in this respect. However Ada, C and Java violate the principle in respect of function values whereas Haskell and Miranda do not




1 In fact in Ada anonymous record types (unlike anonymous array types) are illegal.

2 Sometimes it may be possible to determine at compile time that the constraints will always be satisfied, for example if a variable of subtype Percent is assigned to a variable of the subtype Natural.

3 Large integer values may not be exactly representable as float values and when converting from a float to an integer any fractional part is lost. If the conversion is from an integer type to an integer type of smaller precision, the most significant bits are discarded.

4 byte in Java is roughly equivalent to char in C or C++.

5 They can, of course, be assigned element by element.

6 However in C and Ada 95 pointers to functions and procedures are first class values and in effect allow subprograms to be used as procedure parameters etc.

© A Barnes, 2006 CS2130/Unit 9

Download 67.59 Kb.

Share with your friends:




The database is protected by copyright ©ininet.org 2024
send message

    Main page