Kenneth D. Shupe
An evaluation of COBOL, or COmmon Business Oriented Language, according to the criteria set forth in Concepts of Programming Languages/Robert W. Sebesta.—10th ed. Pub. Pearson Education, Inc. The purpose of this document is to provide general knowledge of COBOL’s capabilities as a programming language.
Table of Contents
History 4
Overview of Language Design, Syntax, and Semantics 4
Names, Bindings, and Scope 4
Data Types 5
Expression and Assignment Statements 7
Statement-Level Control Structures 7
Subprograms 8
Abstract Data Types 9
Support for Object-Oriented Programming 9
Concurrency 10
Exception Handling 10
Other Issues 10
Evaluation 10
Readability 10
Writability 11
Reliability 11
Cost 12
Conclusion 12
Bibliography 14
COBOL Design and Evaluation
History
During the first decade that computers came into use, they were mostly programmed in machine language or assembler language. These languages were specific to the particular machine for which the programs were written. This was troublesome since a programmer would have to relearn how to write programs every time they tried to program on a different machine.
The first implemented high-level language (HLL), FORTRAN, alleviated these problems but, due to its limited input/output capabilities, it was not suitable for large data processing problems. COBOL was developed from the need for applications to accomplish this type of processing. “In response to the shared need of the business community and government agencies for a high-level, machine-independent computer language that was tailored to solve data processing problems, the CODASYL (Conference on Data Systems Languages) committee was formed.” (Molluzzo)
At the time, FLOW-MATIC was a compiled language that belonged to UNIVAC, and only ran those companies computers. AIMACO was also being developed by the U.S. Air Force. Other languages for business applications were also being developed. “One of the overriding concerns at the meeting was that steps to create this universal language be created quickly, as a lot of work was already being done to create other business languages…the longer it took to produce a universal language, the more difficult it would be for the language to become widely used.” (Sebesta)
COBOL was the result of their efforts. The design committee’s criteria were simple. The language should use English as much as possible. The language should be easy to use in order to facilitate ease of training programmers. It should be readable to the point that non-programmers could understand what the program was doing. Finally, the design should not be overly restricted by the implementation of the language. (Sebesta)
Overview of Language Design, Syntax, and Semantics
Names, Bindings, and Scope
The structure of a COBOL program has four divisions. The first of these is the identification division, which contains the PROGRAM-ID, or the name of the program. The programmer may also include their name as the author and comments about the purpose of the program itself.
The second division is the environment division, which would contain the configuration section, identifying the computer on which the program is compiled and executed, and the input-output section that specifies the external files with which the program’s files are identified.
The data division, depending on the size of the records and operations being performed, is usually the largest part of a COBOL program. Its purpose is to specify the data being used in the program. The file section defines the program’s files while the working-storage section defines all other data. If a program is a sub-program executed by a larger program, then a linkage section specifying the arguments passed to it by the main program.
The procedure division contains the actual execution statements of the program. This division consists of paragraphs, which can be named by the programmer, and the paragraphs are made up of the statements to be executed.
Names in COBOL are traditionally made to sound as much like English as possible. This is possible since COBOL data names can be up to thirty characters long and include the letters A-Z, the digits 0-9, and the hyphen character. “The name given to data should be fully spelled out in the language of the programmer…For example, TOTAL-INPUT-TRANSACTION-COUNT is clearly more comprehensible than T1…If the data name clearly defines the usage, without reference to other sources or other documentation, then the person has eliminated a significant comprehension problem and can immediately analyze program logic.” (Rogers)
For scope terminators, COBOL prefers the period. Period indicates the end of a statement within a paragraph. However, since all data that will be inputted, manipulated, or outputted must be specified in the data division of a program, data names rarely go out of scope. The exception to this is subprograms. Data names exist until the end of execution of that subprogram. In the main program, all data is global and does not go out of scope until the STOP RUN or CANCEL command is encountered.
Data Types
The data types supported by COBOL are simple. There are alphanumeric types, which can represent any character on the keyboard. For example, PIC X(7) or PIC XXXXXXX represents an alphanumeric field of 7 characters. Second are alphabetical types, which if you haven’t guessed, represent the alphabetical characters A-Z. The third type, numeric, requires some discussion, as there are many ways to declare numeric types in COBOL. Some of these descriptions can also be applied to alphanumeric types, but alphanumeric types cannot be used for calculations.
Numeric types are described in the same way using a 9 instead of X. Numeric fields can also be further described with decimal points using a V, as in PIC 99V99. Here, four digits are actually stored in memory, 9999, and the character V tells COBOL where the decimal point should be. A P can be used to indicate how many zeros to place in front or behind of a numeric field, making it easier to represent large numbers. For the S, consider the description PIC S9(2); this allows the integer range -99 to 99 to be assigned to a numeric field. COBOL compliers assume that all numeric fields are positive unless otherwise specified.
“A category of data closely related to numeric fields, called numeric edited, uses different PICTURE characters. Whereas a numeric field must not contain decimal point, comma, or any character other than 0 through 9 and a sign, a numeric edited may contain all those characters and more.” (Popkin)
Numeric edited fields provide ways for COBOL programmers to edit outputs with relative ease. Since, there are only a handful of editing characters that can be included in an editing PICTURE clause formatting output is made relatively easy. “The act of moving the numeric item to an editing item edits the number.” (Molluzzo)
Consider PIC $ZZ,ZZZ.99: The Z’s represent zero suppression, while the comma, decimal point, and dollar sign are inserted directly into the field when it prints. If we use the move command to place 1234567 in the field, the output will $12,345.67, and 0012345 will output $ 123.45, where the empty spaces between $ and 1 is a blank. Alternatively, PIC $$$,$$$.99 would print 0012345 as $123.45.
FD FILENAME
LABEL RECORDS ARE OMITTED.
01 STUDENT
05 NAME
05 YEAR PIC 9
88 FRESHMAN VALUE 1
88 SOPHOMORE VALUE 2
88 JUNIOR VALUE 3
88 SENIOR VALUE 4
05 MAJOR
It is important to note that COBOL does not support logical/Boolean types. However, COBOL allows condition names to be declared in level 88 of any COBOL program to simulate Boolean types. For example, observe the following description of a file input:
Now, a programmer can code the following statement: IF SENIOR (which has the value 4)…imperative statements. While some may consider this restricting, it actually allows for faster coding of large programs, since the VALUE could represent a range of values.
Expression and Assignment Statements
In COBOL, performing arithmetic and assignment is restrictive. There are rules which govern the MULTIPLY, ADD, SUBTRACT, and DIVIDE verbs. So, a simple mathematical expression can be given by “ADD X TO Y” or “MULTIPLY X BY Y GIVING Z”, as long as the operands and resultant fields are the correct data types as given by the rules governing these verbs. The COMPUTE statement allows for an expression to be written in a more mathematical notation and less code. If there were four multiplication operations to be performed to compute, say, a monthly salary then four lines of code would have to be written for each operation.
The MOVE and ASSIGN statements can be used to assign values to data names, such as MOVE X TO Y or ASSIGN X TO Y. Whenever moving data around in the program, you must be vigilant of the types and sizes of the data being moved. Most compilers will warn if you’re going to lose data when moving it, but you could move alphanumeric data into a numeric type resulting in a runtime-error if you attempt to modify that variable.
Statement-Level Control Structures
COBOL’s controls structures include the familiar but poorly implement IF statement and nested IF statements, PERFORM statements, EVALUATE, and the SELECT…ASSIGN statements. The SELECT…ASSIGN statement is mostly used for buffering purposes when reading file inputs.
Since reading ahead in a file is good practice, the SELECT…ASSIGN statement allows for the programmer to supply the blocking factor for a file being written to or specify the size of the buffers when reading in files. Though not a control structure for the program itself, it can reduce runtime when reading from large files or writing large records. If creating a file that contains thousands of records, choosing the right size blocking factor will shorten the execution time of writing the file. (Pugh)
IF statements in COBOL function as any programmer would think. One of the major differences is the lack of an END IF or a delimiter that lets the reader know the IF statement has ended. Instead a period or the key word ELSE can close or end an IF. Once an IF statement has begun execution, the statement will not terminate until it sees an ELSE or a period. A programmer can easily find themselves in an infinite loop if they’re not cautious, especially when using nested IFs.
Most programmers are familiar with the relation test, in which two fields are compared based on the numeric value. COBOL also allows for other common tests, such as sign testing and class testing. A useful test in COBOL is the condition name test, which is a work around for the languages lack of support for Boolean values. However, making use of this test can make a program harder to read. “Many programmers use [condition names] with fields in the Working-Storage Section, but their use there must often be carefully examined to see whether they make the program easier or more difficult to read.” (Popkin)
In the previous level-88 example, assume that, instead of year being able to take the values 1-4, year could take on a range of values say 1-10, 10-20, 30-40, etc. Instead, of VALUE 4 in SENIOR, we could write VALUES ARE 40 THROUGH 50 and still perform the same condition test.
PERFORM statements are the primary means of moving through program. Usually, programs will contain many PERFORM statements which indicate a paragraph to be executed. Some PERFORM statements can continually execute until a condition is met. (PERFORM procedure UNTIL condition or PERFORM procedure VARYING counter UNTIL condition)
EVALUATE ROUTINE-CODE
WHEN “A” PERFORM A-ROUTINE
WHEN “B” PERFORM B-ROUTINE
WHEN OTHER PERFORM BAD-DOG
END-EVALUATE
The EVALUATE statement is similar to the CASE statement with which most programmers are familiar. The syntax is even extremely similar:
The EVALUATE statement was added in the COBOL-85 standard. Prior to this, case statements were performed using nested IFs or condition name tests.
The CALL statement is used to begin execution of a subprogram. The calling program can pass arguments to the subprogram and have arguments returned to it from the subprogram. These subprograms are discussed next. It is important that passing parameters was not always allowed. “Perhaps the most important weakness of the original procedure division was in its lack of functions. Versions of COBOL prior to the 1974 standard also did not allow subprograms with parameters.” (Sebesta)
Subprograms
A subprogram must have some way to refer to the arguments it receives from the program which calls it. “The data-names used for arguments by the subprogram cannot be defined in the subprogram’s FILE or WORKING-STORAGE sections because these sections define main-memory locations within the subprogram.” (Molluzzo)
This is where the LINKAGE SECTION comes in. The LINKAGE SECTION is a special section of the subprogram’s DATA DIVISION. In this section, the programmer can define the data-names that the subprogram associates with the arguments by the calling program. In a subprogram the header for the PROCEDURE DIVISION must include the USING clause with data defined in the LINKAGE SECTION. (PROCEDURE DIVISION USING somename anothername thisname)
When the CALL is encountered in the main program, it stops execution and executes the subprogram. If the subprogram is dynamically linked, then it is compiled and its memory allocated at the time it is called. The only drawback to dynamically linking subprograms in COBOL is if a subprogram is called more than once in the main program. When the end of the program is reached and control is returned to the calling program, the next time the subprogram is called, it resumes execution from the end of the program. In order, to “re-prime” the subprogram, the keyword CANCEL must be used in the calling program. This will release the memory allocated to the subprogram.
Encapsulation was not supported in early versions of COBOL, but the use of subprograms allows for encapsulation to be accomplished. If a program has to receive commands from a user and then access or modify data based on those commands, subprograms can hide the part that performs command analysis from the actual accessing or modifying of data. “All data in a COBOL subprogram is inaccessible to any other subprogram (unless explicitly communicated when passing arguments)…Use a parameter of the call to identify the particular operation required. Code a switch at the start of the subprogram to test the value of the parameter and accordingly invoke the appropriate piece of code…Every operation on these data items [will be] localized within the subprogram.” (Pugh)
Abstract Data Types
All types that are declared to be used in a program must have a specified size. In COBOL, there are no user-defined types. Programmers are restricted to the alphanumeric, numeric, and alphabetical types. So, there are no abstract data types in COBOL. However, this seems to be by design. As stated earlier, the primary concern was to create a language which was verbose, easy to read, and easy to learn. Allowing programmers to define their own types would have hurt the readability of COBOL and made it near impossible for inexperienced programmers to read the other programmer’s code.
Support for Object-Oriented Programming
Today, there are object-oriented COBOL programming environments available. The 2002 standard included support for object-oriented COBOL. However, object-oriented COBOL could almost be considered a different language by anyone who learned COBOL before 2002. This revision included many other features like Boolean support and compile-time parameter checking. (Micro Focus)
Concurrency
Concurrency is not supported by the design of the language. The development of new run-time environments allows for some limited concurrency.
Exception Handling
Some keywords in COBOL have exception checking. READ has AT END and INVALID KEY; ON OVERFLOW is exception thrown when the data is too large to be stored in the name. These denote code to be executed when exceptions are detected. However, the programmer must specify a field for a file to hold status codes which then can be checked after input/output statements to determine what exception was thrown.
Another issue with COBOL code is that universities are not graduating COBOL programmers anymore. Although, COBOL may be the equivalent of a cave-painting language compared to modern languages, it doesn’t change the fact that there is lots of COBOL code out there written by people who will soon be retiring. “Jim Gwinn, CIO for the USDA Farm Service Agency, faced that type of situation…[Systems] run COBOL programs that process $25 billion in farm loans and programs. ‘We have millions of lines of COBOL…It has become increasingly difficult to change the code because of the complexity and the attrition of the knowledge base that wrote it.’” (Mitchell)
“In a recent Computerworld survey of 357 IT professionals, 46% of the respondents said they are already noticing a COBOL programmer shortage, while 50% said the average age of the COBOL staff is 45 or older, and 22% said the age is 55 or older.” (Mitchell)
This issue will one day greatly affect the cost of maintaining COBOL code as COBOL programmers retire and leave the work force, leaving several hundred million lines of code behind them, forcing companies to either pay the high cost of relying on outsourcers maintain their servers and code, or the even higher cost to move off the mainframes.
Evaluation
Readability
There are a limited number of primitive data types in COBOL and the lack of user defined types and functions make the language very simplistic. However, because of the nature of the applications that COBOL is designed for, program code can become confusing as many programs grow to a few hundred thousand lines of code. Having to search through a million lines of code to find the part you want to fix is time consuming and frustrating. However, due to the semantics of COBOL, actually recognizing the code that needs to be fixed is much easier than it would be in an object oriented language.
As a programming language, though, a simple COBOL program isn’t difficult to read at all. The lack of data types can be confusing to non-COBOL programmers, especially the use of condition names to substitute for Boolean types.
COBOL’s lack of orthogonality can be alleviated using modular or structured programming techniques. However, as a language, orthogonality was not a principle concern for the design. This is easy to see when you think of the number of keywords in COBOL. There were 300+ keywords prior to the 2002 standard of COBOL. Now, COBOL with support for object-oriented programming contains over 500 keywords. Since COBOL reads like English, a programmer doesn’t have to wrack his or her brain to determine what a keyword actually does. Since the appearance of the keywords directly represents their functionality, the readability of COBOL code is greatly enhanced by this feature.
Writability
Here, the small number of constructs and ways to use them actually serve to enhance COBOL’s writability. It was harder to write COBOL when it had to be entered using a punch card, but with modern IDEs and compilers, writing COBOL code is much easier. A time when writing COBOL code can become difficult is when writing control structures. Forgetting to place a period or an ELSE after an IF statement can have disastrous results. The ability to write IF A = C or A = B as IF A = C OR = B, helps writability especially if multiple relations are being used as a control structure.
The use of condition names can also enhance the writability of code. Consider the example used earlier with YEAR: IF SENIOR OR JUNIOR OR SOPHOMORE PERFORM routine-A ELSE PERFORM rountine-B.
However, many of the things that enhance writability are offset by COBOL’s lack of support for data abstraction. COBOL only supports process abstraction through the use of subprograms and parameter passing. Some might also argue that the naming conventions in COBOL also hurt writability due to the use of long identifiers. However, with Intelli-sense and modern IDEs, long names enhancement to readability greatly offset the cost in writability.
Reliability
COBOL compilers perform type checking and size checking at compile time. However, COBOL will not fail to compile if it has a type checking or size error. If a programmer wants to store the number 100,000 in a data name that can only hold two digits, then he or she can. Or if they want to store a numeric type into an alphanumeric field, then they can. However, COBOL will not compile if a type checking error occurs during a calculation, such as COMPUTE or MULTIPLY. There are restrictions as to which types can be on which side of the operators.
COBOL’s keywords have built in exception handling which, for example, can keep a program from reading past the end of a file or performing computations that result in piece of data too large to be stored in the specified data name. This type of built in exception handling, along with natural way that a COBOL program reads and executes, enhance COBOL’s reliability.
Cost
When it comes to cost, COBOL is hard to evaluate. For any programmer, to learn COBOL is not a difficult task. “The trick is to develop a curriculum that teaches not just COBOL,… but the business rules behind the code.” (Mitchell)
The cost of writing programs in COBOL currently is not difficult. But as COBOL programmers retire that cost will skyrocket. Migrating entire programs off of mainframes is much more costly and expensive than adding in a few new lines of code to ensure continued functionality. However, as COBOL knowledge disappears, the reverse could become true. Mitigating this cost is the responsibility of the company implementing the language. “…still needs COBOL programmers to replace those expected to retire, and the learning curve can last for a year or more. That means adding staff and having a period of overlap as COBOL’s secrets get passed on to the next generation.” (Mitchell)
Since COBOL programs usually only need to compile once, and be executed a few hundred times a day for ten to twenty years, a programmer would be correct in assuming that companies who use COBOL spend most of their money optimizing the code before its implementation.
However, probably the factor that most influences the cost of COBOL, is maintainability. “The importance of software maintainability cannot be overstated. It has been estimated that for large software systems with relatively long lifetimes, maintenance costs can be as high as two to four times as much as development costs.” (Sebesta)
Conclusion
COBOL’s high readability definitely makes it a preferable programming language for business area applications. The writability of COBOL code is influenced by the rules of the business implementing the code, which can hurt writability. If a programmer does not understand the business rules behind the logic, then writing COBOL code to perform to specifications can be difficult. Since COBOL’s statements generally reflect their purpose, a programmer could determine business rules simply by reading COBOL code. The lack of data abstraction in COBOL and the cumbersome computations also hurt writability, so COBOL would receive a score of moderate in the area of writability. As just discussed, the cost of COBOL programming is relatively cheap right now, but in the next few years, the cost of maintaining COBOL programs could take off, making COBOL less than preferable for companies. “…migrations by small and midsize mainframe shops that move off what they see as a legacy language when they retire the hardware, says analyst Dave Vecchio…Compounding the loss of skills and business knowledge is the fact that, for some organizations, decades of changes have created a convoluted mess of spaghetti code that even the most experienced programmers can’t figure out. ‘Some systems are snarled so badly that programmers aren’t allowed to change the code at all,’ [David Garza, president and CEO of Trinity Millennium Group] says. ‘It’s simply too risky to change it.’” (Mitchell)
With billions of lines of code out there, some COBOL will never be gone and neither will the need for programmers to update or modify that code. As COBOL programmers become scarce, this maintenance cost could quickly grow out of control.
COBOL may be considered a legacy language, but its effectiveness in business computing cannot be refuted. In terms of technology time, if modern high-level languages are considered today’s English, the language of COBOL may be equivalent to early cave paintings. Imagine if cave paintings were still considered the primary means of communication for certain interactions. This is the reality of COBOL. In this case, the cave paintings just happen to be a REALLY effective communication tool.
Coughan, Michael. www.csis.ul.ie/cobol/. May 2007. 18 September 2012.
Menendez, Raul & Lowe, Doug. Murach's Structured COBOL for the CICS Programmer. Fresno, CA: Mike Murach & Associates Inc, 2001.
Mitchell, Robert L. www.computerworld.com. 21 May 2012. 12 September 2012.
Molluzzo, John C. Complete Course in Structured COBOL Programming. Belmont, CA: Wadsworth Publishing Company, 1989.
Popkin, Gary S. Introductory Structured COBOL Programming, Second Edition. Boston, MA: Kent Publishing Company, 1985.
Pugh, John & Bell, Doug. Modern Methods for COBOL Programmers. Englewood Cliffs, NJ: Prentice Hall International, Inc., 1983.
Rogers, Gary Robert. The COBOL Programmer's Design Book. Toronto, ON: John Wiley & Sons, Inc, 1986.
Sebesta, Robert W. Concepts of Programming Languages, Tenth Edition. Upper Saddle River, NJ: Pearson Education, 2012.