A very common pitfall is failing to allocate space for a pointer then assigning the dereferenced pointer variable.
According to the ISO C++ Standard, if there is insufficient memory to allocate the requested storage,6 the behavior of new is to throw an exception of type bad_alloc. The result is the program crashes if the exception is not handled. I mention this to my classes, but do nothing about handling exceptions until Chapter 18.
//returned from the new operator.
{ //allocate requested memory.
#include
using namespace std;
int main()
{
int* p1 = new int; //throws bad_alloc if new fails
int* p2 = new(nothrow) int; //returns 0 if new fails
*p1 = 1;
*p2 = 2;
cout << *p1 << endl;
cout << *p2 << endl;
return 0;
}
Pitfall: Applying delete to a pointer that has already been deleted
An error you will make repeatedly in C++ is to apply the delete operator to a pointer that points to memory that has already had the delete operator applied to it. This is a guaranteed segmentation violation.
In short: dynamic variables destroyed with delete must have been created with new, and (only) dynamic variables created with new should be destroyed (with delete) after you are through with them.
Pitfall: Deleting a pointer not obtained from the new operator
Deleting memory not allocated with new will corrupt the heap (free store) organization. At best, the program will give a runtime error message. Under Linux, using g++, I get a 'segmentation fault' -- sometimes. Corrupting the free store is usually a run time disaster.
One positive word: The ARM (Annotated C++ Reference Manual, by Ellis and Stroustrup) points out, "Deleting a pointer that has the null pointer value is guaranteed to be harmless."
An example:
int main()
{
int i;
int * p = &i;
delete p; //A runtime error which hangs the machine with some
//compilers.
//Linux/g++ gives is a segmentation error.
p = new int[10];
p++; // p now points to the next int on the heap
delete p; // error, but some compilers do not detect this
// Linux/g++ doesn't detect this either
p = 0;
delete p; // OK
return 0;
}
To determine the effect of delete on pointers and on the contents of memory being deallocated, I added initialization to this code, and saved the value of p in an int pointer variable, q, as well as some output statements. The result is that both g++ and Borland C++ change the values originally pointed to by p. 7 Neither compiler detects the error, but Borland C++ binary will give a "null pointer assignment" runtime error, and occasionally will hang the machine. (This machine runs Windows 98 2nd edition.)
Ellis and Stroustrup in the ARM point out that deleting a pointer variable that can be assigned may change the pointer's value and it is likely to change the memory at which the undeleted pointer pointed. Neither of these is guaranteed. Compilers are allowed to change these, not required. Using deleted or dangling pointers is a recipe for disaster. Go to extremes to avoid these errors.
The ARM continues: "In general, catching bad deletions at compile time is impossible; catching them at run time implies time and space overhead. Therefore, the results of such deletions are subtle and usually disastrous. Bad deletions are not detected immediately, and programs containing them are therefore among the nastiest to debug. Almost any effort to avoid such bad deletions is worthwhile." --Emphasis mine.
Use of a dangling pointer is unlikely to be detected by any compiler. The program may work fine until someone (a maintenance programmer) writes additional code and the program mysteriously breaks, because you used memory that belongs to the free store manager, but the added code allocated that memory.
This is the reason the text recommends in the Dangling Pointer Pitfall paragraph that you should to hunt down all pointers that point to the same dynamic variable as the pointer you want to delete. Once found, you should delete one of them, and assign 0 to the other pointers. (The text recommends that you use NULL.) Further deletion of any of these is rendered harmless, and attempting to use them is more likely to result in an error message such as "NULL Pointer Dereferenced."
We pointed out above that deleting a pointer that has the null pointer value is harmless, but deleting a non-null pointer the second time can produce a runtime disaster.
Aside: The NULL preprocessor symbol
The ISO C++ Standard says NULL will be defined in compliant compilers in the header.
My curiosity was piqued about how (and where) NULL is defined on my system, so I used the UNIX grep ( (g)et (re)gular ex(p)ression ) facility to look through some of the header files. Under Linux and gnu C++, NULL is defined in many header files. I found NULL defined with my compiler when I included the header files referred to in these header files: iostream, cstdio, cstdlib, and cstddef. In the following, I have quoted the definition of NULL from several header files. The lines, #ifndef NULL ... #endif, prevent multiple definitions of NULL, just as in complete header files.
#ifndef NULL
#ifdef __cplusplus
#define NULL 0
#else
#define NULL (void*)0
#endif
#endif
The specifics of the definitions is dependent on decisions made by the compiler and library writers, but the effect is exactly the same. The ANSI C++ Standard, Chapter 4 subsection 10 and Chapter 10 subsection 1, specifies that NULL is defined to be 0 or 0L. The preprocessor symbol NULL is defined to be (void*)0 in C. The reason for this is the stronger typing of the C++ language. (The book does not deal with void pointers, so I won't deal with them further in this document..)
Borland C++ has declarations that are essentially the same as this. There is added detail to account for the various 80x86 memory models that MS Windows use.
The reader interested in more details on why things were done this way in C++ should refer to Stroustrup, The Design and Evolution of C++, for specific details. It would take us too far afield to go further into that corner of C++ design and evolution.
Static, Dynamic, and Automatic Variables
The words static and auto are keywords in C and C++ that refer to storage classes (not class in the data structuring sense.) The classes are auto, static, register and extern.
Variables declared auto (automatic) are, as the text indicates, 'created' as the declaration is encountered and 'destroyed' as the scope of the declaration is exited - automatically. 'Created' and 'destroyed' really mean allocation and initialization (if any), and deallocation of variables. Automatic variables are the ordinary variables we declare and use in main, in functions, in classes, and in function members of classes. You will only occasionally see the keyword auto in code. It isn't much used, since this is the default for variables.
Declaring a variable with the register keyword is a hint to the compiler that the programmer thinks the variable so declared might be profitably stored in a CPU register. (A register is memory in the CPU that is much faster than system memory.) The compiler will make the register variables auto unless it accepts the register suggestion. Compilers are much better than programmers at deciding which variables should go into registers. Consequently, the ANSI C++ Standard leaves compilers free to ignore the keyword register. Consequently, this keyword is little used. The register keyword is not treated in the text.
The keyword extern is used to declare an identifier in one file, and indicate to the compiler that the variable this identifier refers to will be defined in another file. These files are to be separately compiled. I cannot find extern or register in the text. I mention because extern and register are reserved keywords. It is possible that a student may ask why it is an error if either of these is used as an identifier.
Local variables declared with the static keyword are 'created' before the main function of the program starts, and 'destroyed' after the main function exits. These variables exist for (or, have lifetime of) the entire run of the program. The visibility or scope of these variables is determined by the C++ scope rules. In spite of the fact that static variables exist for the entire time the program runs, the variables can be used only within their scope. See page 145 of the text, the Local Variables side bar, for a brief discussion of the scope rules of C++. Briefly, the scope of a variable starts at the declaration of the variable and runs to the end of the block in which the variable is declared.
A declaration introduces a name into a program and specifies how the name is to be interpreted. A definition is a declaration that causes allocation of an appropriate amount of storage, and any appropriate initialization to be done.
A function declaration (or prototype) provides the return type, the name of the function and the list of types of formal parameters that must be supplied in a call to the function. Stroustrup (The C++ Programming Language) says "A function definition is a function declaration in which a function body is presented."
Global variables that are declared with the static keyword are inaccessible from files outside the one that declared them. We have pointed out already that the ISO C++ Standard states that this use of static is deprecated. See the section on namespaces in the previous chapter of this IRM for a (brief) discussion of the use of anonymous namespaces to conceal names within a file.
A global variable is accessible in any file of the program in which that variable is declared outside any function. Hence, only one definition of a variable is allowed, all the rest must be declarations.
The text doesn't use global variables. The reason is that use of global variables ties program components together in a way that makes analysis of an individual component independently of other components impossible. The result is rapid growth of complexity of the program with the growth of number of components that use global variables.
Operating systems are the only programs that I know about that use global variables, and operating systems use global variables sparingly.
A Cautionary Note on typedef:
In Pascal, the TYPE statement introduces a new type. In C++, a typedef statement only renames an existing type. I was bitten by this one in my early days in C. In C++ this is exactly the same as in C. The gain in readability of the program is usually well worth the effort to use the typedef.
An example of the use of the TYPE statement in Pascal is:
TYPE FEET = integer; (* declares a new, strongly enforced type *)
VAR X: FEET; (* X has type FEET, incompatible with integer *)
Y: integer;
BEGIN (*equivalent to { C++ *)
Y := 17;
X := Y; (* illegal: incompatible types *)
Here, you cannot assign X to Y or conversely.
A corresponding typedef statement in C/C++ is
typedef int feet;
int X;
feet Y; // declare Y to be of type feet
X = Y; // OK, feet is only a renaming of int
Here the identifier 'feet' is only another name for int. It is not, as in Pascal, a new, enforced, separate type. You gain readability, but not type safety.
The gain in readability of a program is usually well worth the effort to use a typedef. However, you cannot expect the types to be distinct as they are in Pascal.
Basic Memory Management
The area of memory where dynamic variables are stored is called the free store. The free store is a linked list of chunks of memory waiting for allocation, together with another collection of memory chunks that have been already allocated to the program. There are library and support functions to carry out the management.
The synonym, heap seems to be usage inherited from the C language. The terms free store and heap mean exactly the same thing. The student should be aware that both terms are used. We will follow the text’s use and mostly use free store.
There is one caution if you use the term 'heap'. I find that there is some confusion between the 'heap' data structure, meaning a tree-like data structure with certain order relationships imposed on the values in the nodes, and a memory management 'heap', used for allocation of dynamic variables. There is a program language designer's maxim that the C++ developers have not violated. Perhaps they should have. The maxim reads, “Every designer of a (new) programming language must invent new terminology for every (old) language concept.”
If you allocate memory (with new), you should deallocate memory (by applying the delete operator to a pointer pointing to the memory) when you are through with the piece of memory. The delete operator takes a pointer argument that is required to point to memory that was allocated by the new operator. The action is to release memory (to which the pointer pointed) to the free store manager for reallocation. Applying delete to a pointer that points to memory not allocated with the new operator causes free store corruption, and usually causes a segmentation violation. Applying delete to a pointer that points to already deleted memory also corrupts the free store. This error may be indicated by a very hard-to-find run-time error that appears after the program terminates. Note that it is harmless to apply the delete operator to a pointer that has the null pointer value. Consequently, when you apply the delete operator to a pointer, it is wise to hunt down all the pointers that point to memory about to be deleted. Once the delete operator has been applied, the remaining pointers are dangling. Prevent trouble by assign all the pointers to the same memory the integer constant 0. You can use the preprocessor macro NULL, which evaluates to the integer constant 0. After all, there is no legal use for pointers pointing to deleted memory. 8
Pitfall: Assigning a pointer for which memory has not been allocated.
When you declare a pointer variable, memory is allocated for exactly what you have declared, a pointer variable. Space is not allocated for the pointer to point to.9 The programmer is responsible for allocating that space. The text points out that the operator new, with a type name for argument will allocate space appropriate for an object of that type and return a pointer that has type pointer-to-type-name.
int *p; //Only space for the pointer is allocated here
p = new int; //Space is allocated here for the pointer to point to,
//and an assignment to p of the pointer returned by the
//new operator.
Dynamic Variables and Automatic Variables
The programmer uses the new operator with the name of a type for its argument to direct the free store manager to allocate an amount of memory the size of the type. The new operator returns a pointer that points to the dynamically allocated memory. Dynamically allocated memory remains allocated until either it is released by the program or the program terminates. The programmer uses the delete operator with argument a pointer to dynamically allocated memory. Allocated memory is unavailable for any other use while the program is running, so it is important that allocated memory be released as soon as the program is through with it. We will see that a properly written class automatically allocates memory in its constructors and automatically release the allocated memory in its destructor.
The pointers used in dynamic arrays and linked structures point to memory in the free store that has been dynamically allocated.
Uses for Pointers
The main use of pointers is to construct and use dynamically allocated arrays and linked structures. Linked structures are typically linked lists and trees. We will study linked structures in Chapter 17.
Memory is among the scarcest resources in a computer. There are tens of thousands of megabytes (MB) of disk space, but only 64 MB (minimal) to perhaps 128 MB (optimal?) of memory available. Some systems will have 256 megabytes of memory and more.
Most computer systems use virtual memory. Virtual memory enables the system to keep only part of the executing code body in memory and to keep the rest of the executing code (and data) on secondary storage. This has the effect of making the memory appear large as the disk but fast as memory.
At least that’s the theory. The down side of virtual memory comes when memory is “over committed”, i.e., when more memory is needed than the amount of RAM available. As the program continues to execute, the data and code must frequently be retrieved and stored again, requiring a significant number disk accesses during code execution. As memory use increases, the number of disk accesses increases dramatically. 10 The system continues to run but runs more and more slowly. If your system has insufficient memory left to allow your program to allocate memory, the system will already be running so slowly as to be frozen.