Subroutine Closures:
-
Deep binding is implemented by creating an explicit representation of a referencing environment (generally the one in which the subroutine would execute if called at the present time) and bundling it together with a reference to the subroutine.
-
The bundle as a whole is referred to as a closure. Usually the subroutine itself can be represented in the closure by a pointer to its code.
-
If a central reference table is used to represent the referencing environment of a program with dynamic scoping, then the creation of a closure is more complicated.
-
Deep binding is often available as an option in languages with dynamic scope.In early dialects of Lisp.
program binding_example(input, output);
procedure A(I : integer; procedure P);
procedure B;
begin
writeln(I);
end;
begin (* A *)
if I > 1 then
P
else
A(2, B);
end;
procedure C; begin end;
begin (* main *)
A(1, C);
end.
Figure : Deep binding in Pascal
First- and Second-Class Subroutines:
-
In general, a value in a programming language is said to have first-class status if it can be passed as a parameter, returned from a subroutine, or assigned into a variable.
-
Simple types such as integers and characters are first-class values in most programming languages. By contrast, a “second-class” value can be passed as a parameter, but not returned from a subroutine or assigned into a variable, and a “third-class” value cannot even be passed as a parameter.
-
Subroutines are second-class values in most imperative languages but third-class values in Ada 83. They are first-class values in all functional programming languages, in C#, Perl, and Python, and, with certain restrictions, in several other imperative languages, including Fortran, Modula-2 and -3, Ada 95, C, and C++.
-
The Meaning of Names Within a Scope:
-
So far in our discussion of naming and scopes we have assumed that every name must refer to a distinct object in every scope. This is not necessarily the case.Two or more names that refer to a single object in a given scope are said to be aliases. A name that can refer to more than one object in a given scope is said to be overloaded.
Aliases:
-
A more subtle way to create aliases in many languages is to pass a variable by reference to a subroutine that also accesses that variable directly .Euclid and Turing use explicit and implicit subroutine import lists to catch and prohibit precisely this case. _
double sum, sum_of_squares;
...
void accumulate(double& x) // x passed by reference
{
sum += x;
sum_of_squares += x * x;
}
...
accumulate(sum);
Figure 3.14 Example of a potentially problematic alias in C++. Procedure accumulate probably
does not do what the programmer intended when sum is passed as a parameter.
int a, b, *p, *q;
...
a = *p; /* read from the variable referred to by p */
*q = 3; /* assign to the variable referred to by q */
b = *p; /* read from the variable referred to by p */
The initial assignment to a will, onmostmachines, require that *p be loaded into a register. Since accessing memory is expensive, the compiler will want to hang onto the loaded value and reuse it in the assignment to b. It will be unable to do so, however, unless it can verify that p and q cannot refer to the same object.While verification of this sort is possible in many common cases, in general it’s
uncomputable. _
Overloading:
-
Most programming languages provide at least a limited form of overloading. In C, for example, the plus sign (+) is used to name two different functions: integer and floating-point addition.
-
Within the symbol table of a compiler, overloading must be handled by arranging for the lookup routine to return a list of possible meanings for the requested name.
-
The semantic analyzer must then choose from among the elements of the list based on context.
Polymorphism and Related Concepts:
In the case of subroutine names, it is worth distinguishing overloading from the closely related concepts of coercion and polymorphism. All three can be used, in certain circumstances, to pass arguments of multiple types to (or return values of multiple types from) a given named routine. The syntactic similarity, however, hides significant differences in semantics and pragmatics.
Suppose, for example, that we wish to be able to compute the minimum of two values either integer or floating point type. In Ada we might obtain this capability using overloaded functions:
function min(a, b : integer) return integer is ...
function min(x, y : real) return real is ...
In Fortran, however, we could get by with a single function:
real function min(x, y)
real x, y
...
If the Fortran function is called in a context that expects an integer (e.g., i = min(j, k)), the compiler will automatically convert the integer arguments (j and k) to floating-point numbers, call min, and then convert the result back to an integer (via truncation). So long as real variables have at least as many significant bits as integers (which they do in the case of 32-bit integers and 64-bit double-precision floating-point), the result will be numerically correct.
_
Coercion is the process by which a compiler automatically converts a value of one type into a value of another type when that second type is required by the surrounding context. Coercion is somewhat controversial.Pascal provides a limited number of coercions.
-
Macro definition and Expansion:
Definition : macro
A macro name is an abbreviation, which stands for some related lines of code. Macros are useful for the following purposes:
· To simplify and reduce the amount of repetitive coding
· To reduce errors caused by repetitive coding
· To make an assembly program more readable.
-
A macro consists of name, set of formal parameters and body of code. The use of macro name with set of actual parameters is replaced by some code generated by its body. This is called macro expansion.
-
Macros allow a programmer to define pseudo operations, typically operations that are generally desirable, are not implemented as part of the processor instruction, and can be implemented as a sequence of instructions. Each use of a macro generates new program instructions, the macro has the effect of automating writing of the program.
Macros can be defined used in many programming languages, like C, C++ etc.. If the macro has parameters, they are substituted into the macro body during expansion; thus, a C macro can mimic a C function. The usual reason for doing this is to avoid the overhead of a function call in simple cases, where the code is lightweight enough that function call overhead has a significant impact on performance.
For instance,
#define max (a, b) a>b? A: b
Defines the macro max, taking two arguments a and b. This macro may be called like any C function, using identical syntax. Therefore, after preprocessing
z = max(x, y);
Becomes z = x>y? X:y;
While this use of macros is very important for C, for instance to define type-safe generic data-types or debugging tools, it is also slow, rather inefficient, and may lead to a number of pitfalls.
C macros are capable of mimicking functions, creating new syntax within some limitations, as well as expanding into arbitrary text (although the C compiler will require that text to be valid C source code, or else comments), but they have some limitations as a programming construct. Macros which mimic functions, for instance, can be called like real functions, but a macro cannot be passed to another function using a function pointer, since the macro itself has no address.
In programming languages, such as C or assembly language, a name that defines a set of commands that are substituted for the macro name wherever the name appears in a program (a process called macro expansion) when the program is compiled or assembled. Macros are similar to functions in that they can take arguments and in that they are calls to lengthier sets of instructions. Unlike functions, macros are replaced by the actual commands they represent when the program is prepared for execution. function instructions are copied into a program only once.
Macro Expansion.
A macro call leads to macro expansion. During macro expansion, the macro statement is replaced by sequence of assembly statements.
In the above program a macro call is shown in the middle of the figure. i.e. INITZ. Which is called during program execution? Every macro begins with MACRO keyword at the beginning and ends with the ENDM (end macro).whenever a macro is called the entire is code is substituted in the program where it is called. So the resultant of the macro code is shown on the right most side of the figure. Macro calling in high level programming languages
(C programming)
#define max(a,b) a>b?a:b
Main () {
int x , y;
x=4; y=6;
z = max(x, y); }
The above program was written using C programming statements. Defines the macromax, taking two arguments a and b. This macro may be called like any C function, using identical syntax. Therefore, after preprocessing
Becomes z = x>y ? x: y;
After macro expansion, the whole code would appear like this.
#define max(a,b) a>b?a:b
main()
{
int x , y;
x=4; y=6;z = x>y?x:y;
}
Since most large programs are constructed and tested incrementally, and since the compilation of a very large program can be a multihour operation, any language designed to support large programs must provide a separate compilation facility.
UNIT-3
SEMANTIC ANALYSIS
Syntax
• Describes form of a valid program
• Can be described by a context-free grammar
Semantics
• Describes meaning of a program
• Can’t be described by a context-free grammar
Context-Free Grammar (CFG):
In formal language theory, a context-free grammar (CFG) is a formal grammar in which every production rule is of the form
V → w
where V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals (w can be empty). A formal grammar is considered "context free" when its production rules can be applied regardless of the context of a nonterminal. No matter which symbols surround it, the single nonterminal on the left hand side can always be replaced by the right hand side.
Context-free grammars are important in linguistics for describing the structure of sentences and words in natural language, and in computer science for describing the structure of programming languages and other formal languages.
Example
The grammar , with productions
S → aSa,
S → bSb,
S → ε,
is context-free. It is not proper since it includes an ε-production. A typical derivation in this grammar is
S → aSa → aaSaa → aabSbaa → aabbaa.
This makes it clear that . The language is context-free, however it can be proved that it is not regular.
|
Example:
Here are some C language syntax rules:
-
separate statements with a semi-colon
-
enclose the conditional expression of an IF statement inside parentheses
-
group multiple statements into a single statement by enclosing in curly braces
-
data types and variables must be declared before the first executable statement
Semantics is about the meaning of the sentence. It answers the questions: is this sentence valid? If so, what does the sentence mean? For example:
x++; // increment
foo(xyz, --b, &qrs); // call foo
are syntactically valid C statements. But what do they mean? Is it even valid to attempt to transform these statements into an executable sequence of instructions? These questions are at the heart of semantics.
Some constraints that may appear syntactic are enforced by semantic analysis:
• E.g., use of identifier only after its declaration
• Enforces semantic rules
• Builds intermediate representation (e.g., abstract syntax tree)
• Fills symbol table
• Passes results to intermediate code generator
Formal mechanism: Attributes grammars
Enforcing Semantic Rules
Semantic rules are divided into 2 types:
Static semantic rules
• Enforced by compiler at compile time
• Example: Do not use undeclared variable
Dynamic semantic rules
• Compiler generates code for enforcement at run time
• Examples: division by zero, array index out of bounds
• Some compilers allow these checks to be disabled
-
ROLE OF SEMANTIC ANALYSIS:
-
Following parsing, the next two phases of the "typical" compiler are
-
semantic analysis
-
(intermediate) code generation
-
The principal job of the semantic analyzer is to enforce static semantic rules
-
constructs a syntax tree (usually first)
-
information gathered is needed by the code generator
-
There is considerable variety in the extent to which parsing, semantic analysis, and intermediate code generation are interleaved
-
A common approach interleaves construction of a syntax tree with parsing
(no explicit parse tree), and then follows with separate, sequential phases for semantic analysis and code generation
Parse Tree vs. Syntax Tree
A parse tree is known as a concrete syntax tree
-
It demonstrates completely and concretely how a particular sequence of tokens can be derived under the rules of the context-free grammar
-
However, once we know that a token sequence is valid, much of the information in the parse tree is irrelevant to further phases of compilation.
An (abstract) syntax tree (or simply AST) is obtained by removing most of the “artificial” nodes in the parse tree’s interior
The semantic analysis and intermediate code generation annotate the parse tree with attributes
-
Attribute grammars provide a formal framework for the decoration of a tree
-
The attribute flow constrains the order(s) in which nodes of a tree can be decorated.
-
replaces the parse tree with a syntax tree that reflects the input program in a more straightforward way
Dynamic checks:
-
C requires no dynamic checks at all (it relies on the hardware to find division by zero, or attempted access to memory outside the bounds of the program).
-
Java check as many rules as possible, so that an untrusted program cannot do anything to damage the memory or files of the machine on which it runs.
-
Many compilers that generate code for dynamic checks provide the option of disabling them (enabled during program development and testing, but disables for production use, to increase execution speed)
Hoare: “like wearing a life jacket on land, and taking it off at sea”
Assertions:
Logical formulas written by the programmers regarding the values of program data used to reason about the correctness of their algorithms (the assertion is expected to be true when execution reaches a certain point in the code).
-
Java: assert denominator != 0;
-
An AssertionError exception will be thrown if the semantic check fails at run time.
-
C: assert(denominator != 0);
-
If the assertion fails, the program will terminate abruptly with a message: “a.c:10: failed assertion ‘denominator != 0’”
-
Some languages also provide explicit support for invariants, preconditions, and post-conditions.
Loop Invariants:
used to prove correctness of a loop with respect to pre-and post-conditions
[Pre-condition for the loop]
while (G)
[Statements in the body of the loop]
end while
[Post-condition for the loop]
-
A loop is correct with respect to its pre-and post-conditions if, and only if, whenever the algorithm variables satisfy the pre- condition for the loop and the loop terminates after a finite number of steps, the algorithm variables satisfy the post- condition for the loop
Correctness of a Loop to Compute a Product:
-
A loop to compute the product mx for a nonnegative integer m and a real number x, without using multiplication
[Pre-condition: m is a nonnegative integer, x is a real number, i = 0, and product = 0]
while (i ≠m)
product := product + x
i := i + 1
end while
[Post-condition: product = mx]
Loop invariant I(n): i = n and product = n*x
Guard G: i ≠ m
Static Analysis:
-
In general, compile time algorithms that predict run time behavior are known as Static analysis.
-
Such analysis is said to be precise if it allows the compiler to determine whether a given program will always follow the rules.
-
Type checking for eg., is static and precise in languages like ADA and ML: the compiler ensures that no variable will ever be used at run time in a way that is inappropriate for its type.
-
Static analysis can also be useful when it isn’t precise. Compilers will often check what they can at compile time and then generate code to check the rest dynamically.
-
In Java, for example, type checking is mostly static, but dynamically loaded classes and type casts may require run-time checks.
-
In a similar vein, many compilers perform extensive static analysis in an attempt to eliminate the need for dynamic checks on array subscripts, variant record tags, or potentially dangling pointers.
-
If we think of the omission of unnecessary dynamic checks as a performance optimization, it is natural to look for other ways in which static analysis may enable code improvement.
-
Examples include,
-
Alias analysis: determines when values can be safely cached in registers, computed “out of order,” or accessed by concurrent threads.
-
Escape analysis: determines when all references to a value will be confined to a given context, allowing it to be allocated on the stack instead of the heap, or to be accessed without locks.
-
Subtype analysis: determines when a variable in an object- oriented language is guaranteed to have a certain subtype, so that its methods can be called without dynamic dispatch.
-
An optimization is said to be unsafe if they may lead to incorrect code in certain programs,
-
An optimization is said to be speculative if they usually improve performance, but may degrade it in certain cases
-
Non-binding prefetches bring data into the cache before they are needed,
-
Trace scheduling rearranges code in hopes of improving the performance of the processor pipeline and the instruction cache.
-
A compiler is conservative if it applies optimizations only when it can guarantee that they will be both safe and effective.
-
A compiler is optimistic if it uses speculative optimizations.
-
it may also use unsafe optimizations by generating two versions of the code, with a dynamic check that chooses between them based on information not available at compile time.
-
Both semantic analysis and (intermediate) code generation can be described in terms of annotation, or "decoration" of a parse or syntax tree
-
ATTRIBUTE GRAMMARS provide a formal framework for decorating such a tree
-
An attribute grammar “connects” syntax with semantics
-
Attributes have values that hold information related to the (non)terminal
-
Semantic rules are used by a compiler to enforce static semantics and/or to produce an abstract syntax tree while parsing tokens
-
Multiple occurrences of a nonterminal are indexed in semantic rules
-
An attribute grammar is an augmented context-free grammar:
-
Each symbol in a production has a number of attributes.
-
Each production is augmented with semantic rules that
– Copy attribute values between symbols,
– Evaluate attribute values using semantic functions,
– Enforce constraints on attribute values.
-
Consider LR (bottom-up) grammar for arithmetic expressions made of constants, with precedence and associativity
EE+T
EE–T
ET
TT*F
TT/F
TF
F-F
F(E)
F const
-
This says nothing about what the program MEANS
-
Attributed grammar:
- define the semantics of the input program
- Associates expressions to mathematical concepts!!!
- Attribute rules are definitions, not assignments: they are not necessarily meant to be evaluated at any particular time, or in any particular order
Production Semantic Rule
-
E1→E2 + T E1.val := sum(E2.val,T.val)
-
E1→E2 - T E1.val := difference(E2.val,T.val)
-
E→T E.val := T.val
-
T1→T2 * F T1.val := product(T2.val,F.val)
-
T1→T2 / F T1.val := quotient(T2.val,F.val)
-
T →F T.val := F.val
-
F1→-F2 F1.val := negate(F2.val)
-
F →( E ) F.val := E.val
-
F →const F.val := const.val
Fig: 4.1 A Simple attribute grammar for constants expressions, using the standard arithmetic operations.
-
In our expression grammar, we can associate a val attribute with each E, T, F, const in the grammar.
-
The intent is that for any symbol S, S.val will be the meaning, as an arithmetic value, of the token string derived from S.
-
We assume that the val of a const is provided to us by the scanner.
-
We must then invent a set of rules for each production, to specify how vals of different symbols are related. The resulting attribute grammar is shown above.
-
The rules come in two forms. Those in productions 3,6,8,9 are known as copy rules; they specify that one attribute should be a copy of another.
-
The other rules known as Attribute Evaluation Rules that invoke semantic functions (sum, quotient, negate etc.)
-
When more than one symbol of a production has the same name, subscripts are used to distinguish them. These subscripts are solely for the benefit of the semantic functions; they are not part of the context-free grammar itself.
Attributed grammar to count the elements of a list:
L→id L1.c:=1
L1→L2,id L1.c:=L2.c+1
Semantic functions are not allowed to refer to any variables or attributes outside the current production
|