The software depends not only on the computer’s machine language, but also on a large collection of programs called the operating system that supplies higher-level primitives than those of the machine language.
Sample primitives: system resource management, input and output operations, a file management system, program editors, etc.
It goes through all the stages of translation and generates all the user source program codes into machine codes before the program is being executed.
Linking may be necessary to connect the user code to the system programs.
The user and system code together was sometimes called a load module.
It allows easy implementation of many source-level debugging operations, because all run-time error message can refer to the source-level units.
The speed of the connection between a computer’s memory and its processor usually determines the speed of computer, because instructions often can be executed faster than they can be moved to the processor for execution. von Neumann bottleneck
They translate high-level language programs to an intermediate language designed to allow easy interpretation. It is faster than pure interpretation because the source language statements are decoded only once.
Figure 4 Pure interpretation
From fig. 3, there are three stages of compilation including lexical analysis, syntax analysis and code generation.
It breaks up the input source codes to the compiler into chunks that are in a form suitable to be analysed by the next stage of the compilation process.
The strings of characters representing the source program are broken up into small chunks, called token.
It is usual to remove all redundant parts of the source code (such as spaces and comments) during this tokenisation phase. It is also likely in many system that keywords such as END or PROCEDURE will be replaced by a more efficient, shorter token.
It is the job of the lexical analyser to check that all the keywords used are valid and to group certain symbols with their neighbours so that they can form larger units to be presented in the next stage of the compilation process.
A symbol table for programmer-defined identifiers would be created during lexical analysis and would contain details of attributes such as data types. As part of this standardized format, the tokens may be replaced by pointers to symbol tables.
Typically entries in the symbol table will show
the identifier or keyword;
the kind of item (variable, array, procedure, keyword, etc.);
the type of item (integer, real, char, etc.);
the run-time address of the item, or its value if it is a constant; and
a pointer to accessing information (e.g. for an array, the bounds of the array, or for a procedure, information about each of parameters).
Since the lexical analyser spends a great proportion of its time looking up the symbol table, the symbol table must be organised in such a way that entries can be found as quick as possible. Thus, binary search tree may be used.