NCS- 503 Principle of Programming Language
I. Introduction
The Role of Programming Languages: Why Study Programming Languages, Towards Higher-Level languages, Programming paradigms, Programming environments
Language Description: Syntactic structure, language Translation Issues: Programming language Syntax, Stages in translation, Formal translation Models
II. Language Properties
Modeling Language Properties, Elementary Data Types, Encapsulation, Inheritance, Sequence Control, Subprogram Control
III. Programming Paradigms
Imperative Programming: Statements, Types, Procedure Activations Object-Oriented Programming: Grouping Of Data and Operations, object oriented programming
Functional Programming: Elements, Programming in a typed language, Programming with lists
IV. Other Programming Paradigms
Logic Programming, Concurrent Programming, Network Programming, Language Description: Semantic Methods
V. Lambda Calculus
Introduction to Lambda Calculus, Simple types, Subtyping
Text books:
1. “Programming Languages: Design and Implementations” , Terrance W.Pratt, Marvin V. Zelkowitz,
T.V.Gopal,Fourth ed.,Prentice Hall
2. “Programming languages: Concepts and Constucts”, Ravi Sethi, Second Ed.,Pearson.
3. “Types and programming Languages”, Benjamin C. Pierce. The MIT Press Cambridge, Massachusetts London, England
References:
1. Concepts of Programming Languages, Robert W. Sebesta, 10th Ed.,Pearson
Principle of Programming Languages (NCS-503)
Unit - 1.
Introduction: Programming Language
In the modern society, relying on information technologies, programming education is extremely important. It is clear that the choice of the first programming language and the corresponding programming paradigm is critical for later development of an IT professional. Over the last fifty years, there was thousands of programming languages introduced, belonging to several programming paradigms.
It is very important to detect what are suitable features of a programming language, especially in the context of education & to consider these issues both in terms of individual programming languages and in terms of programming paradigms. Over the last decades, several programming paradigms emerged and profiled. The most important ones are: imperative, object-oriented, functional, and logic paradigm.
A programming language is a formal constructed language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs to control the behavior of a machine or to express algorithms.
The description of a programming language is usually split into the two components of syntax (form) and semantics (meaning). Some languages are defined by a specification document (for example, the C programming language is specified by an ISO Standard), while other languages (such as Perl) have a dominant implementation that is treated as a reference.
Language Designers have a basic vocabulary about language structure; meaning and pragmatic concerns that help them understand how language works. Their vocabulary falls into three major categories that we call programming language design.
-
Syntax: Describes what constitutes a structurally correct program.
-
Names & Types: Enables the programmer to understand and properly implement operations on the values of various types.
-
Semantics: Defines the meaning of a program.
i.e. when a program is executed, the effect of each statement on the values of the variable in the program is given by the semantic of the language.
Definition: Programming Language:
A programming language is a notation for writing programs, which are specifications of a computation or algorithm. Some, but not all, authors restrict the term "programming language" to those languages that can express all possible algorithms. Traits often considered important for what constitutes a programming language include:
-
Function and target:
A computer programming language is a language used to write computer programs, which involve a computer performing some kind of computation or algorithm and possibly control external devices such as printers, disk drives, robots, and so on. For example, PostScript programs are frequently created by another program to control a computer printer or display. It is generally accepted that a complete specification for a programming language includes a description, possibly idealized, of a machine or processor for that language. In most practical contexts, a programming language involves a computer; consequently, programming languages are usually defined and studied this way. Programming languages differ from natural languages in that natural languages are only used for interaction between people, while programming languages also allow humans to communicate instructions to machines.
-
Abstractions:
Programming languages usually contain abstractions for defining and manipulating data structures or controlling the flow of execution. The practical necessity that a programming language support adequate abstractions is expressed by the abstraction principle; this principle is sometimes formulated as recommendation to the programmer to make proper use of such abstractions.
-
Expressive power:
The theory of computation classifies languages by the computations they are capable of expressing. All Turing complete languages can implement the same set of algorithms. ANSI/ISO SQL-92 and Charity are examples of languages that are not Turing complete, yet often called programming languages.
Class Objective:
-
Know how to write programs in different paradigms
-
Know how to translate between different languages
-
Concepts in programming languages:
-
Know the concepts of typical programming languages
-
Understand how to implement programming languages (the structures of compilers and interpreters)
-
Understand trade-offs in programming language design
-
Appreciate diversity of ideas
-
Critical thinking
-
Be prepared for new problem-solving paradigms
The Role of Programming Languages:
-
Not only in designing and implementing, but in teaching and supporting it.
-
Natural languages:
-
Interfaces for expressing information
-
Different natural languages
-
English, Chinese, French, German, …
-
Programming languages:
-
Interfaces for expressing data and algorithms
-
Instructing machines what to do
-
Facilitate communication between computers and programmers
-
Different programming languages
-
FORTRAN, Pascal, C, C++, Java, Lisp, Scheme, ML, …
Why Study Programming Languages?
-
Reasons for Studying Concepts of Programming Languages:
The following is what we believe to be a compelling list of potential benefits of studying concepts of programming languages:
-
Increased capacity to express ideas:
It is widely believed that the depth at which people can think is influenced by the expressive power of the language in which they communicate their thoughts. It is widely believed that the depth at which people can think is influenced by the expressive power of the language in which they communicate their thoughts.
For example, a C programmer who had learned the structure and uses of associative arrays in Perl (Wall et al., 2000) might design structures that simulate associative arrays in that language.
-
Improved background for choosing appropriate languages:
Many professional programmers have had little formal education in computer science; rather, they have developed their programming skills independently or through in house training programs. Such training programs often limit instruction to one or two languages that are directly relevant to the current projects of the organization. Many other programmers received their formal training years ago.
-
Increased ability to learn new languages:
-
Computer programming is still a relatively young discipline, and design methodologies, software development tools, and programming languages are still in a state of continuous evolution.
For example: Programmers who understand the concepts of object-oriented programming will have a much easier time learning Java (Arnold et al., 2006) than those who have never used those concepts.
-
The TIOBE Programming Community issues an index (http://www .tiobe.com/tiobe_index/index.htm) that is an indicator of the relative popularity of programming languages.
For example, according to the index, Java, C, and C++ were the three most popular languages in use in August 2011.
-
Better understanding of the significance of implementation:
In learning the concepts of programming languages, it is both interesting and necessary to touch on the implementation issues that affect those concepts. In some cases, an understanding of implementation issues leads to an understanding of why languages are designed the way they are. In turn, this knowledge leads to the ability to use a language more intelligently, as it was designed to be used.
For example, programmers who know little about the complexity of the implementation of subprogram calls often do not realize that a small subprogram that is frequently called can be a highly inefficient design choice.
-
Better use of languages that are already known:
Many contemporary programming languages are large and complex. Accordingly, it is uncommon for a programmer to be familiar with and use all of the features of a language he or she uses. By studying the concepts of programming languages, programmers can learn about previously unknown and unused parts of the languages they already use and begin to use those features.
-
Overall advancement of computing:
Finally, there is a global view of computing that can justify the study of programming language concepts. Although it is usually possible to determine why a particular programming language became popular, many believe, at least in retrospect, that the most popular languages are not always the best available.
For example, many people believe it would have been better if ALGOL 60 (Backus et al., 1963) had displaced FORTRAN (Metcalf et al., 2004) in the early 1960s, because it was more elegant and had much better control statements, among other reasons.
Definition: High-Level Programming Language:
Toward high-level language
-
Machine language is low level, full of details
-
Assembly language is a variant of machine language in which names and symbols take the place of actual codes
Fig.1: How to move toward high-level language
In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In comparison to low-level programming languages, it may use natural language elements, be easier to use, or may automate (or even hide entirely) significant areas of computing systems (e.g. memory management), making the process of developing a program simpler and more understandable relative to a lower-level language. The amount of abstraction provided defines how "high-level" a programming language.
In the 1960s, high-level programming languages using a compiler were commonly called autocodes. Examples of autocodes are COBOL and Fortran.
The first high-level programming language designed for computers was Plankalkül, created by Konrad Zuse & it influenced Heinz Rutishauser's language "Superplan" (and to some degree also Algol). The first really widespread high-level language was Fortran, a machine independent development of IBM's earlier Autocode systems. Algol, defined in 1958 and 1960, by committees of European and American computer scientists, introduced recursion as well as nested functions under lexical scope. Algol also introduced several structured programming concepts, such as the while-do and if-then-else constructs and its syntax was the first to be described by a formal method, Backus–Naur Form (BNF). During roughly the same period Cobol introduced records (also called structs) and Lisp introduced a fully general lambda abstraction in a programming language for the first time.
Features:
-
"High-level language" refers to the higher level of abstraction from machine language. Rather than dealing with registers, memory addresses and call stacks, high-level languages deal with variables, arrays, objects, complex arithmetic or boolean expressions, subroutines and functions, loops, threads, locks, and other abstract computer science concepts, with a focus on usability over optimal program efficiency.
-
Unlike low-level assembly languages, high-level languages have few, if any, language elements that translate directly into a machine's native opcodes.
-
Other features, such as string handling routines, object-oriented language features, and file input/output, may also be present.
Benefits of higher-level language:
-
Readability is the ease with which a written text can be understood by a reader. Various factors to measure readability have been used, such as "speed of perception," "perceptibility at a distance," "perceptibility in peripheral vision," "visibility," "the reflex blink technique," "rate of work" (e.g., speed of reading), "eye movements," and "fatigue in reading."
-
Readability is distinguished from legibility which is a measure of how easily individual letters or characters can be distinguished from each other.
-
Readability can determine the ease in which computer program code can be read by humans, such as through embedded documentation.
-
Writability: the ease with which a language can be used to create programs
-
Reliability: conformance to specifications (i.e., performs to its specifications)
-
Cost: the ultimate total cost
-
Machine independent (portable):
-
Portable object (computing), a distributed computing term for an object which can be accessed through a normal method call while possibly residing in memory on another computer.
-
Software portability, software that can easily be ported to multiple platforms.
-
Portable applications, applications that do not require any kind of installation onto a computer, and can store data in the program's directory.
-
Availability of program libraries:
-
In computer science, a library is a collection of non-volatile resources used by computer programs, often to develop software. These may include configuration data, documentation, help data, message templates, pre-written code and subroutines, classes, values or type specifications. In IBM's OS/360 and its successors they are referred to as partitioned data sets.
-
A library is a collection of implementations of behavior, written in terms of a language, which has a well-defined interface by which the behavior is invoked. This means that as long as a higher level program uses a library to make system calls, it does not need to be re-written to implement those system calls over and over again.
-
In information theory and coding theory with applications in computer science and telecommunication, error detection and correction or error control are techniques that enable reliable delivery of digital data over unreliable communication channels. Error detection techniques allow detecting such errors, while error correction enables reconstruction of the original data in many cases.
-
The general definitions of the terms are as follows:
Error detection is the detection of errors caused by noise or other impairments during transmission from the transmitter to the receiver.
Error correction is the detection of errors and reconstruction of the original, error-free data.
Types of High-level Languages:
-
FORmula TRANslation.
-
Developed at IBM in the mid-1950s.
-
Designed for scientific and mathematical applications by scientists and engineers.
-
Common Business Oriented Language.
-
Developed in 1959.
-
Designed to be common to many different computers.
-
Typically used for business applications.
-
Beginner’s All-purpose Symbolic Instruction Code.
-
Developed at Dartmouth College in mid 1960s.
-
Developed as a simple language for students to write programs with which they could interact through terminals.
-
Developed by Bell Laboratories in the early 1970s.
-
Provides control and efficiency of assembly language while having third generation language features.
-
Often used for system programs.
-
UNIX is written in C.
-
It is C language with additional features.
-
Widely used for developing system and application software.
-
Graphical user interfaces can be developed easily with visual programming tools.
-
An object-oriented language similar to C++ that eliminates lots of C++’s problematic features.
-
Allows a web page developer to create programs for applications, called applets that can be used through a browser.
-
Objective of JAVA developers is that it be machine, platform and operating system independent.
Markup Languages
-
HyperText Markup Language.
-
Used on the Internet and the World Wide Web (WWW).
-
Web page developer puts brief codes called tags in the page to indicate how the page should be formatted.
Programming Paradigms: A Programming Paradigm is a pattern of problem-solving thoughts that underlies a particular genre of programs and languages.
-
Imperative/Procedural programming (C)
-
Object-oriented programming (C++)
-
Functional/Applicative programming (Lisp)
-
Logic/Declarative/Rule-based programming (Prolog)
-
Concurrent programming
-
Imperative Programming (sequence of actions)
-
Fortran 1955-1960 for scientific programming
-
familiar
-
notations efficiency
-
Toward common language
-
Algol, Algol 60 1960s. Imperative languages is viewed as Algol family.
-
Pascal was designed as a teaching language
-
C 1972, created by Dennis Ritchie. C provides a rich set of operators, a terse syntax, and efficient access to the machine. C implements UNIX system.
-
Fortran, C, Pascal, Algol,…
-
Express side-effects of statements and subroutines
-
Emphasize machine efficiency
-
Compiler optimizations (Fortran), efficient mapping to machine (C)
-
Objected-Oriented Programming (1961-1967)
-
From Simula’s origins in simulation
-
Key concepts from Simula is that of class of object
-
A traffic simulation can have many cars and many trucks
-
The simulation has
-
objects of the class of cars and
-
objects of the class of trucks.
-
All of which share certain properties
-
Car and trucks are all vehicles
-
The classification of objects into classes and subclasses is central to object oriented programming
-
Smalltalk is an interactive system with graphic user interface
-
Simula, C++, Java, smalltalk,…
-
Emphasize abstraction and modular program organization
-
C++ was designed to bring the benefits of objects to imperative programming in C.
-
Java (Network Language)
-
Platform-independent capability
-
Using virtual machine
-
Functional Programming: Lisp 1958 for artificial intelligence (AI)
-
Functional programming is style of programming in which the basic method of computation is the application of functions to arguments
-
A functional language is one that supports and encourages the functional style.
-
Example:
-
Summing the integers 1 to 10 in Java:
total = 0;
for (i = 1; i £ 10; ++i)
total = total+i;
The computation method is variable assignment.
-
Summing the integers 1 to 10 in Haskell:
sum [1..10]
The computation method is function application.
-
Lisp is an acronym derived from List Processor.
-
Example:
-
(+ 2 3)
-
(Shakespeare wrote (the tempest))
-
(cdr (a b c d e))
-
Program and data are both represented by lists in LISP
-
Creation of common Lisp in 1984
-
CLOS is an object-oriented extension; the full name is Common Lisp Object System
-
Lisp, Scheme, ML, Haskell, …
-
Express evaluation of expressions and functions
-
Emphasize expressiveness and flexibility
-
Mostly interpreted and used for project prototyping
Basic Concepts Used In Functional Programming:
-
Types and classes
-
Defining functions
-
List comprehensions
-
Recursive functions
-
Higher-order functions
-
Why is it useful?
-
The abstract nature of functional programming leads to considerably simpler programs;
-
It also supports a number of powerful new ways to structure and reason about programs
-
Logic Programming: Prolog 1972 for Natural Language Processing
-
Every psychiatrist is a person.
-
Every person he analyzes is sick.
-
Shyan is a psychiatrist in Marseille.
-
Is Shyan a person?
-
Where is Shyan?
-
Is Shyan sick?
-
Prolog uses a specialized form of logical reasoning to answer such queries.
-
Concurrent Programming for multithread and Multiprocessor
Levels of Programming Languages:
Two ways to implement a language: compilation vs. interpretation:
Fig.2: Levels of Programming Languages
-
Some languages are higher level than others, why?
-
(Readability, programmability, maintainability)
-
Compiler translates source program into an equivalent program in the target language.
-
Once the translation is complete, the target code is run at a later time, called run time.
-
Compilation can be more efficient than interpreter
Fig.3: Block Diagram of Compiler
Stages of Compiler:
Fig.4: Stages of Compiler
-
It takes a program and its input at the same time
-
It scans the program, implementing operations as it encounters them, and doing input/output as needed.
-
Interpreter can be more flexible than compilation.
Programming Domains:
-
Large numbers of floating point computations; use of arrays
-
Fortran
-
Produce reports, use decimal numbers and characters
-
COBOL
-
Symbols rather than numbers manipulated; use of linked lists
-
LISP
-
Need efficiency because of continuous use
-
C
-
Eclectic collection of languages: markup (e.g., XHTML), scripting (e.g., PHP), general-purpose (e.g., Java)
Programming environments:
-
A collection of tools used in software development
-
UNIX
-
An older operating system and tool collection
-
Nowadays often used through a GUI (e.g., CDE, KDE, or GNOME) that runs on top of UNIX
-
Microsoft Visual Studio.NET
-
A large, complex visual environment
-
Used to build Web applications and non-Web applications in any .NET language
-
NetBeans
-
Related to Visual Studio .NET, except for Web applications in Java
The environment in which programs are created and tested.
-
Separate compilation
-
Separate execution
-
Testing
-
Debugging
Effects on language design:
-
Modular organization
-
Local/global variables
-
Libraries
Process control languages:
-
Able to process programs and data files
-
Specify a sequence of operations on program and data files. For example: Awk, Perl, Tcl/Tk
Language design must:
-
Allow program solution to match problem structure.
-
Allow for world-wide use.
-
Be easy to prove correctness of solutions
Influences on Language Design:
-
Computer Architecture
-
Programming Design Methodologies
Computer Architecture:
The basic architecture of computers has had a profound effect on language design. Most of the popular languages of the past 50 years have been designed around the prevalent computer architecture, called the von Neumann architecture, after one of its originators, John von Neumann (pronounced “von Noyman”).
-
Well-known computer architecture: Von Neumann
-
Imperative languages, most dominant, because of von Neumann computers
-
Data and programs stored in memory
-
Memory is separate from CPU
-
Instructions and data are piped from memory to CPU
-
Basis for imperative languages
-
Variables model memory cells
-
Assignment statements model piping
-
Iteration is efficient
Fig.5: The von Noyman computer architecture
-
Fetch-execute-cycle (on a von Neumann architecture computer)
-
initialize the program counter
-
repeat forever
-
fetch the instruction pointed by the counter
-
increment the counter
-
decode the instruction
-
execute the instruction
Programming Design Methodologies:
-
New software development methodologies (e.g., object-oriented software development) led to new programming paradigms and by extension, new programming languages
-
1950s and early 1960s: Simple applications; worry about machine efficiency
-
Late 1960s: People efficiency became important; readability, better control structures
-
structured programming
-
top-down design and step-wise refinement
-
Late 1970s: Process-oriented to data-oriented
-
Middle 1980s: Object-oriented programming
-
Data abstraction + inheritance + polymorphism
Language Categories:
-
Central features are variables, assignment statements, and iteration
-
Include languages that support object-oriented programming
-
Include scripting languages
-
Include the visual languages
-
Examples: C, Java, Perl, JavaScript, Visual BASIC .NET, C++
-
Main means of making computations is by applying functions to given parameters
-
Examples: LISP, Scheme
-
Rule-based (rules are specified in no particular order)
-
Example: Prolog
-
Markup/programming hybrid
-
Markup languages extended to support some programming
-
Examples: JSTL, XSLT
Layered View of Computer:
The operating system and language implementation are layered over machine interface of a computer
Fig. 6: Layered View of Computer
Major Components of a Computer:
-
Data: Various kinds of elementary and structured data.
-
Main memory
-
High-speed register
-
High-speed cache memory
-
External files
-
Data and Program
-
Primitive operations: A set of build-in primitive operations.
-
Arithmetic operations on each built-in numeric data (+,-,*,/)
-
Testing various properties of data items (test for zero, positive, and negative numbers)
-
Accessing and modifying various parts of a data item
-
Controlling input-output devices
-
Sequence control (jumps)
-
Sequence control: There is an interpreter.
-
Fetch the instruction
-
Decode instruction
-
Fetch designated operands
-
Branch to designated operation
-
Execute primitive operations 1 to n
-
Using an address register & controlling the sequence of primitive operations execution.
-
Data access: Access to operands of the operation & controlling the data supplied to each execution of an operation.
-
Storage management: Keeping all resources of the computer operating as much as possible
-
Memory
-
Central processor
-
External data devices
-
Multiprogramming & Cache memory
-
Controlling the allocation of storage for programs and data.
-
Operating environment: The outside world of computer; a set of peripherals and input-output devices. Providing mechanisms for communication with an external environment containing programs and data.
Language Description:
Design to:
-
Run efficiently : early languages
-
Easy to write correctly : new languages
-
Data typing features in ML
-
Class of C++
-
Package of Ada
Language Translation Issues & Formal Translation Models:
Language Translation issues:
-
Programming language Syntax
-
Key criteria concerning syntax
-
Basic syntactic concepts
-
Overall Program-Subprogram structure
-
Analysis of the source program
-
Synthesis of the object program
-
Bootstrapping
Programming language Syntax:
The syntax of a programming language describes the structure of programs without any consideration of their meaning.
Key criteria concerning syntax:
-
Readability – a program is considered readable if the algorithm and data are apparent by inspection.
-
Write-ability – ease of writing the program.
-
Verifiability – ability to prove program correctness (very difficult issue)
-
Translatability – ease of translating the program into executable form.
-
Lack of ambiguity – the syntax should provide for ease of avoiding ambiguous structures
Basic Syntactic Concepts:
-
Character set – The alphabet of the language. Several different character sets are used: ASCII, EBCIDIC, and Unicode
-
Identifiers – strings of letters of digits usually beginning with a letter
-
Operator Symbols – +-*/
-
Keywords or Reserved Words – used as a fixed part of the syntax of a statement
-
Noise words – optional words inserted into statements to improve readability
-
Comments – used to improve readability and for documentation purposes. Comments are usually enclosed by special markers
-
Blanks – rules vary from language to language. Usually only significant in literal strings
-
Delimiters – used to denote the beginning and the end of syntactic constructs
-
Expressions – functions that access data objects in a program and return a value
-
Statements – these are the sentences of the language, they describe a task to be performed
Overall Program-Subprogram Structure:
-
Separate subprogram definitions: Separate compilation, linked at load time E.g. C/C++
-
Separate data definitions: General approach in OOP.
-
Nested subprogram definitions: Subprogram definitions appear as declarations within the main program or other subprograms. E.g. Pascal
-
Separate interface definitions:
-
C/C++ header files
-
Data descriptions separated from executable statements. A centralized data division contains all data declarations. E.g. COBOL
-
Un-separated subprogram definitions: No syntactic distinction between main program statements and subprogram statements.
E.g. BASIC
Stages in Translation:
Analysis of the source program:
Lexical analysis (scanning) – identifying the tokens of the programming language: keywords, identifiers, constants and other symbols
In the program
void main()
{ printf("Hello World\n"); }
the tokens are
void, main, (, ), {, printf, (, "Hello World\n", ), ;, }
Synthesis of the object program:
-
Syntactic analysis (parsing) – determining the structure of the program, as defined by the language grammar.
-
Semantic analysis - assigning meaning to the syntactic structures
Example: int variable1;
-
Meaning: 4 bytes for variable1, a specific set of operations to be used with variable1.
Basic Semantic Tasks:
-
The semantic analysis builds the bridge between analysis and synthesis.
-
Basic semantic tasks:
-
Symbol–table maintenance
-
Insertion of implicit information
-
Error detection
-
Macro processing
-
Result: an internal representation, suitable to be used for code optimization and code generation.
Synthesis of the object program:
-
Three main steps:
-
Optimization - Removing redundant statements
-
Code generation - generating assembler commands with relative memory addresses for the separate program modules - obtaining the object code of the program.
-
Linking and loading - resolving the addresses - obtaining the executable code of the program.
Optimization example: Statements in yellow can be removed
Bootstrapping:
-
The compiler for a given language can be written in the same language.
-
A program that translates some internal representation into assembler code.
-
The programmer manually re-writes the compiler into the internal representation, using the algorithm that is encoded into the compiler.
-
From there on the internal representation is translated into assembler and then into machine language.
Syntax and Semantics:
Syntax: what the program looks like.
Semantics: the meaning given to the various syntactic constructs.
Example: V: array [0..9] of integer;
int V[10];
Formal Translation Models:
-
Syntax is concerned with the structure of programs.
-
The formal description of the syntax of a language is called grammar
-
Grammars consist of rewriting rules and may be used for both recognition and generation of sentences.
-
Grammars are independent of the syntactic analysis.
More about grammars:
-
Word categories, constituents
-
Language consists of sentences. Each sentence consists of words.
-
The rules that tell us how to combine words that form correct sentences are called grammar rules.
-
Example from English:
-
"The boy reads" is a valid sentence
-
"Boy the reads" is an invalid sentence.
The corresponding rule here says that the article must precede the noun.
Here you see the words "article" and "noun".
These words correspond to certain categories of words in the language.
For example, the words boy, book, room, class, are all nouns, while read, speak, write, are verbs.
-
Why do we need word categories?
There are infinitely many sentences and we cannot write a rule for each individual sentence. We need these categories in order to describe the structural patterns of the sentences. For example, the basic sentence pattern in English is a noun phrase followed by a verb phrase.
-
A sequence of words that constitutes a given category is called a constituent. For example, the boldface parts in each of the sentences below correspond to a constituent called verb phrase.
-
The boy is reading a book.
-
The boy is reading an interesting book.
-
The boy is reading a book by Mark Twain.
-
Terminal and non-terminal symbols, grammar rules
-
How do we represent constituents and how do we represent words?
-
Grammars use two types of symbols:
-
Terminal - to represent the words.
-
Non-terminal - to represent categories of words and constituents.
These symbols are used in grammar rules. Here are some examples:
-
There is one special non-terminal symbol S to represent "sentence".
-
It is called also the starting symbol of the grammar.
-
Grammars for programming languages - no major differences
BNF notation:
Grammars for programming languages use a special notation called BNF (Backus-Naur form):
-
The non-terminal symbols are enclosed in < >
-
Instead of , the symbol ::= is used
-
The vertical bar is used in the same way - meaning choice.
-
[] are used to represent optional constituents.
-
BNF notation is equivalent to the first notation in the examples above.
Example: The rule
::= =
says that an assignment statement has a variable name on its left-hand side
followed by the symbol "=", followed by an arithmetic expression.
Example: Expression syntax
-
The only non-terninal symbol that remains undefined is I, meaning 'integer'. It is defined below using BNF notation:
-
::= [-]
-
::=
-
::=
-
::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
-
If we want to exclude 0 as a starting digit, we need to introduce two types of digits: non_zero_digit and any_digit:
-
::=
-
::=
-
::= 0 |
-
::= 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Derivations:
-
Using a grammar, we can generate sentences. The process is called derivation
-
Example: The simple grammar on p. 91: S à SS | (S) | ( ) generates all sequences of paired parentheses.
-
The rules of the grammar can be written separately
-
Rule1: S SS
-
Rule2: S (S)
-
Rule3: S ( )
-
One possible derivation is:
-
S => ( S ) by Rule1
-
S => ( S S ) by Rule2
-
S => ( ( )S ) by Rule3
-
S => ( ( ) ( ) ) by Rule3
-
Sentential forms - The strings obtained at each step are called sentential forms. Strings obtained at each derivation step. May contain both terminal and non-terminal symbols.
-
Sentence - The last string obtained in the derivation, contains only terminal symbols.
-
They may contain both terminal and non-terminal symbols. The last string obtained in the derivation contains only terminal symbols. It is called a sentence in the language.
-
This derivation is performed in a leftmost manner.
-
That is, at each step the leftmost variable in the sentential form is replaced.
Parse trees:
-
Parsing is the process of syntactic analysis of a string to determine whether the string is a correct sentence or not. It can be displayed in the form of a parse tree:
-
Each non-terminal symbol is expanded by applying a grammar rule that contains the symbol in its left-hand side. Its children are the symbols in the right-hand side of the rule.
-
Note: The order of applying the rules depends on the symbols to be expanded. At each tree level we may apply several different rules corresponding to the nodes to be expanded.
Ambiguity: The case when the grammar rules can generate two possible parse trees for one and the same sequence of terminal symbols.
Example:
-
Let S denote a statement, Exp - an expression.
-
Consider the following rules for if statements (the words if, then, else are terminal symbols):
-
Rule1: If_statement if Exp then S else S
-
Rule2: If_statement if Exp then S
-
Consider now the statement: if a < b then if c < y then write (yes) else write (no);
-
Following the grammar rules above, there are two possible interpretations:
-
If a < b then if c < y then write(yes)
else write(no);
-
If a < b then if c < y then write(yes)
else write(no);
-
Here is the parse tree for the first interpretation:
-
Here is the parse tree for the second interpretation:
-
A grammar that contains such rules is called ambiguous.
-
It is very important that grammars for programming languages are not ambiguous.
Grammars for programming languages:
-
Types of grammars: There are 4 types of grammars depending on the rule format.
-
Regular grammars: (Type 3 )
-
Context-free grammars (Type 2)
-
A ® any string consisting of terminals and non-terminals
-
Context-sensitive grammars (Type 2)
-
String1 String2
-
String1 and String2 are any strings consisting of terminals and non-terminals, provided that the length of String1 is not greater than the length of String2
-
General grammars (Type 0)
-
String1 String2, no restrictions.
-
Regular grammars and regular expressions:
Strings of symbols may be composed of other strings by means of
-
concatenation - appending two strings, and
-
Kleene star operation - any repetition of the string. e.g. a* can be a, or aa, or aaaaaaa, etc
-
Given an alphabet ∑ , regular expressions consist of string concatenations combined with the symbols U and *, possibly using '(' and ')'. There is one special symbol used to denote an empty expression: Ø
-
Ø and each member of ∑ is a regular expression.
-
If α and β are regular expressions, then (α β) is a regular expression.
-
If α and β are regular expressions, then α U β is a regular expression.
-
If α is a regular expression, then α* is a regular expression.
-
Nothing else is a regular expression.
-
Let ∑ = {0,1}. Examples of regular expressions are:
-
0,1, 010101, any combination of 0s and 1s
|
generated strings
|
0 U 1
|
0, 1
|
(0 U 1)1*
|
0, 01, 011, 0111,…, 1, 11, 111..
|
(0 U 1)*01
|
01, 001, 0001,… 1101, 1001,
|
(any strings that end in 01)
-
Regular languages are languages whose sentences are regular expressions.
-
Regular grammars are used to describe identifiers in programming languages and arithmetic expressions.
-
Context-free grammars generate context-free languages. They are used to describe programming languages.
Share with your friends: |