Programming, Problem Solving

Download 0.51 Mb.

Page	4/6
Date	05.08.2017
Size	0.51 Mb.
	#26659

1 2 3 4 5 6

Data Abstraction
In C, Arrays provide examples of structured data types. (As you'll see later, records, files, and sets are also structured data types.) In a structured data type, the relationship between the elements of the data type is as important as the individual components themselves. For instance, in a one–dimensional Array of integers, we completely define the structure by giving its values and by specifying the positions of these values relative to other elements in the list of integers. For example, we say that 19 is the first value in the list, 89 is the second, 12 is the third, and so on.

There are two implicit conclusions to be drawn from what we have just said. First, in a data structure such as a list, we specify the structure by defining the relationships between items as well as the values of individual items. Second, we can then implement that conceptually defined structure in our programs using a tool from C; for example, a C Array is used to implement the concept of a list. It is important to note the distinction between the conceptual data structure that we define and its implementation in C. In the case of a list, this distinction may seem relatively minor because C provides a one–dimensional Array as an obvious means of implementing a conceptual list. However, there is no guarantee that there will always be such a straightforward link between a conceptual data structure and its implementation in C.

Consider, for instance, the structure charts that we have used to specify the relationships between modules in a C program. Such a structure chart could itself be considered a data structure. As such, it is clearly a more complex structure than a list. In this data structure, the data items are the module names; their relationship is that Module 2 is directly subordinate to Module 1 if Module 1 calls Module 2 in performing its logic. The relationship between data items in this structure is hierarchical. This hierarchical relationship constitutes the basis for a conceptual data structure known as a tree. We will not study trees in detail here. For one reason, C provides no direct implementation for trees. The syntax of C does not include a structured type that completely reflects the hierarchical relationship between items in a tree.

Does this mean that one cannot write a C program that manipulates a tree data structure? No, the representation of trees is typically covered in a second course in computer science. However, the separation between the conceptual definition of a data structure and its eventual implementation in a particular computer language is an important one to keep in mind. This separation, often called data abstraction, is a topic we shall study more rigorously in chapter 14. The value of data abstraction is that it allows us to think in terms of data structures without being shackled by the syntactical constraints of a programming language--similar to what pseudocode allows us to do in describing an algorithm's logic. Once the data structure and the algorithms that manipulate it are understood at this conceptual level, we can then turn our attention to ways of implementing the structure in C or other languages. Some newer languages, such as Ada or C++ or Modula–2, emphasize this data abstraction theme to a greater degree than C. They allow the programmer to write packages of routines with implementations for such abstract data types; these may then be called from any program that works with data of that abstract type. In effect, these languages allow the programmer to extend the data typing capacity of the language to emphasize the separation of the data structure's definition from its implementation. This approach to using and studying data structures seems certain to have a profound effect on computer science in the coming years.

7.5

Searching

Algorithms

The need to search an Array for a value is a common problem. For example, you might wish to replace a test score for a student, delete a name from a directory or mailing list, or upgrade the pay scale for certain employees. These and other problems require you to be able to examine elements in some list until the desired value is located. When it is found, some action is taken.

Sequential Search
The first searching algorithm we will examine is the most common method, a sequential search. This process is accomplished by examining the first element in some list and then proceeding to examine the elements in the order they appear until a match is found. Variations of this basic process include searching a sorted list for the first occurrence of a value, searching a sorted list for all occurrences of a value, and searching an unsorted list for the first occurrence of a value.

Example 7.15

Unordered List ---> 20 15 12 8 5 9 3 2 5 18

SoughtNum --------> 2
To illustrate a sequential search, suppose you have an array of integers, called List, and you want to find the first occurrence of some particular value (2), called SoughtNum. If the desired value is located, you want to print its position. If the value is not in the Array, let us print an appropriate message. The code for such a search is
Index = 1;

while ( (SoughtNum != List[Index]) && (Index < ActNo) )

Index++;

if (SoughtNum == List[Index])

printf ("%d is in position %d of List\n",SoughtNum,Index);

else

printf ("%d is not in List\n",SoughtNum);

Example 7.16

Sorted List -------> 2 3 5 5 8 9 12 15 18 20

SoughtNum ------> 4
Let's now consider some variations of this problem. Although the code in Example 7.15 works for both a sorted and an unsorted list, it is certainly not efficient for searching a sorted (ordered) list. Why search the sorted list, above, beyond the third element (5) of the array? If it is sorted, all of the remaining values will also exceed the SoughtNo.

The code fragment below offers a much better solution for printing the location of the first occurrence of a SoughtNo within a sorted List.

Index = 1;

while ( (SoughtNum < List[Index]) && (Index < ActNo) )

Index++;
if (SoughtNum == List[Index])

printf ("%d is in position %d of List\n",SoughtNum,Index);

else

printf ("%d is not in List\n",SoughtNum);

Example 7.17

Unordered List ---> 20 15 12 8 5 9 3 2 5 18

SoughtNum --------> 2
A relatively easy modification of the sequential search is to examine an unsorted (unordered) List for all occurrences of SoughtNo.

Count = 0;

for (Index = 1; Index <= ActNo; Index ++ )

{

if (SoughtNum == List[Index])

{

Count ++;

printf ("%d => is in position %d\n", SoughtNum, Index);

}

This code works for an unsorted list. A modification of the code for working with a sorted list is included as an exercise.

Binary Search
Searching relatively small lists sequentially does not require much computer time. However, sequentially searching longer sorted lists ( telephone directories and lists of credit card customers) is inefficient. In a sense, they correspond to looking up a word in the dictionary by starting at the first word and proceeding word–by–word until the desired word is found. Since extra computer time means extra expense for most companies where large amounts of data must be frequently searched, a more efficient way of searching is needed.

If the list to be searched has been sorted, it can be searched for a particular value by a method referred to as a binary search. Essentially, a binary search consists of examining a middle value of an Array to see which half contains the desired value. The middle value of this half is then examined to see which half of the half contains the value in question. This halving process is continued until the value is located or it is determined that the value is not in the list. (Remember, however, in order to use a binary search, the list must be sorted and the sorting process has its own costs which should be evaluated.)

The code for this process is relatively short. We shall search the sorted List for some SoughtNum. First and Last are integer variables.

First = 1;

Last = ActNo;

Found = False;

while ((!Found) && (First <= Last))

{

Mid = (First + Last) / 2;

if (SoughtNum == List[Mid])

Found = True;

else if (SoughtNum > List[Mid])

First = Mid + 1;

else

Last = Mid – 1;

}

This loop checks to see if List[Mid] is the SoughtNum; if so set Found to True to exit loop. Since 10 is not the SoughtNum(13), the code determines if the SoughtNum is in the top half of the ordered list (subscripts 5-7) or the bottom half of the ordered list (subscripts 1-3) Since 13 is larger than 10, the new First is set to Mid + 1 (4) and the loop continues. Notice that half of the array will be eliminated in one look; elements 1-3 of the sorted array need not be examined any further.

This loop checks to see if List[Mid] is the SoughtNum; if so set Found to True to exit loop. Since 19 is not the SoughtNum(13), the code determines if the SoughtNum is in the top half of the ordered list (subscripts 7-7) or the bottom half of the ordered list (subscripts 5-5) Since 13 is smaller than 19, the new Last is set to Mid - 1 (5) and the loop continues. Notice that almost half of the this sub-array will be eliminated on the second look; elements 7-7 of the sorted array need not be examined any further.

This loop checks to see if List[Mid] is the SoughtNum; if so set Found to True to exit loop. Since 13 is the SoughtNum(13), Found is set to True and the loop terminates. The loop terminates either because the SoughtNum has been found in the sorted List or because it is not in the array. If the First is greater than the Last, the SoughtNum is not in the sorted List; otherwise Found has been set to True and Mid contains the subscript telling where the SoughtNum may be found.

Before continuing, let's walk through this search a second time to get a better understand how it works. Assume A is the array
4 7 19 25 36 37 50 100 101 205 220 271 306 321

A[1] A[14]

with values as indicated. Furthermore, assume SoughtNum contains the value 25. Then initially, First, Last, and Num have the values
1 14 25

First Last SoughtNum

A listing of values by each pass through the loop produces
First Last Mid A[Mid] Found

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Before loop 1 14 - - False

After first pass 1 6 7 50 False

After second pass 4 6 3 19 False

After third pass 4 4 5 36 False

After fourth pass 4 4 4 25 True

To illustrate what happens when the value being looked for is not in the Array, suppose SoughtNum contains 210. The listing of values then produces

First Last Mid A[Mid] Found

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Before loop 1 14 - - False

After first pass 8 14 7 50 False

After second pass 8 10 11 220 False

After third pass 10 10 9 101 False

After fourth pass 11 10 10 205 False

At this stage, First > Last and the loop is exited.

Relative Efficiency of Searches
Let's now examine briefly the efficiency of a binary search compared to a sequential search. For purposes of this discussion, assume a sequential search on a list of 15 items requires at most 15 microseconds. The nature of a sequential search is such that every time you double the list length, the maximum searching time is also doubled; Figure 7.1 illustrates this increase.

Figure 7.1

Sequential Search

The logic of the binary search is more complex than that of the sequential search. Let us assume a list of 15 items requires a maximum of 60 microseconds when searched by a binary search. Since this process consists of successively halving the list, at most four passes will be required to locate the value. This means each pass uses 15 microseconds. When the list length is doubled, it requires only one more pass. Thus, a list of 30 items requires 75 microseconds and a list of 60 items requires 90 microseconds. This is shown graphically in Figure 7.2.

Figure 7.2

Binary Search

Differences in computer hardware and compilers will cause the actual times for each of these will vary. For very small sorted lists, the sequential search is more efficient than the binary search. At some point (to be determined by hardware, compiler, and code) the binary search is more efficient than the sequential search. This is best illustrated by showing the comparison of these two searches on the same graph; see Figure 7.3.

Figure 7.3

Sequential Search

vs Binary Search

Subscript Errors
If Array has three elements, a reference to Score[3] or Score[-1] is generally considered to be a subscript error. Although subscript errors can cause an execution error, they will normally destroy other adjacent variables in your program or crash the system. We have stressed the importance of valid subscripts for four sections; it is time to illustrate the potential disastors awaiting subscript errors. Let us assume that the memory manager assigns the declarations

int

A = 6,

B = 5;

int

Array [MAX] = {1,2,3};

char

C = 'A',

D = 'B';

the following contiguous memory:

|-------|

A => &3200706 | 6 |

|-------|

B => &3200704 | 5 |

|-------|

Array [ 2] => &3200702 | 3 |

|-------|

Array [ 1] => &3200700 | 2 |

|-------|

Array [ 0] => &3200698 | 1 |

|-------|

C => &3200697 | A |

|-------|

D => &3200696 | B |

|-------|

Let us assume that we wish to execute the following line of code:

Array[3] = 44;

There will be times when the advanced C programmer knows enough about a specific chunk of memory to effectively exceed the boundaries of an array, but we should make every effort to stay within acceptable bounds.

The base address of Array is &3200698. A reference to Array[3] is a reference to &3200698 + 3 * sizeof(int) = &3200704; when two bytes are written to this address, the assignment destroys the contents of integer variable B.
Our contiguous memory now looks like

|-------|

A => &3200706 | 6 |

|-------|

B => &3200704 | 44 |

|-------|

Array [ 2] => &3200702 | 3 |

|-------|

Array [ 1] => &3200700 | 2 |

|-------|

Array [ 0] => &3200698 | 1 |

|-------|

C => &3200697 | A |

|-------|

D => &3200696 | B |

|-------|

Let us assume that we wish to execute the following line of code:

Array[-1] = 97;

The base address of Array is &3200698. A reference to Array [–1] is a reference to &3200698 + –1 * sizeof(int) = &3200696; when two bytes are written to this address, the assignment destroys both the contents of variable C and the contents of variable D.

Our contiguous memory now looks like

|-------|

A => &3200706 | 6 |

|-------|

B => &3200704 | 44 |

|-------|

Array [ 2] => &3200702 | 3 |

|-------|

Array [ 1] => &3200700 | 2 |

|-------|

Array [ 0] => &3200698 | 1 |

|-------|

C => &3200697 | a |

|-------|

D => &3200696 | |

|-------|

Part of the power of C is to quickly and effectively control all of the memory available to the user, but with the increased power is potential danger. Subscript errors on single-user microcomputer C environments can be very frustrating. Any of the following can happen when there is an attempt to change the wrong memory:

1. If the address calculated by the subscript error is part of the memory assigned to other local variables, the validity of these variables may destroy other logic, computations, or variables; this error is often hard to debug.

2. If the address calculated by the subscript error is part of the memory assigned to the stack, the returns to functions could be destroyed and cause very unpredictable errors.

3. If the address calculated by the subscript error is part of the memory assigned to the compiler, certain portions of the compiler and/or editor may be inoperative. The compiler application will often need to be terminated and restarted. This error will lock-up the entire system if the compiler-exit has been destroyed; the computer would have to be rebooted.

4. If the address calculated by the subscript error is part of the memory assigned to another application, portions or all of that application may be inoperative.

5. If the address calculated by the subscript error is part of the memory assigned to the operating system, certain portions of the operating system may be inoperative. You might not be able to copy files, format disks, etc. The computer would have to be rebooted. This error will sometimes lock-up the entire system.

6. If the address calculated by the subscript error is not within the limits of memory available to the system, the computer will sometimes lock-up and have to be rebooted.

7. If the address calculated by the subscript error is in a portion of memory that is not being used, there may be no disastrous results.
For an array declared

int

Array [MAX];

make every effort to make sure that your subscripts are always within limits (between 0 and MAX-1).

Exercises 7.5
1. Modify the sequential search in the beginning of this section to locate and print all occurrences of the same value.
2. Complete the C program:

FillListWithRandomIntegers (List1,&ActNo1,MAX1,20,1,100);

Function FillListWithRandomScores shall place 20 random integers [range 1- 100] into array List1. Do not place more than MAX1 integers into array. Do not use element 0.

DisplayList ( List1, ActNo1);

Function DisplayList shall display the memory associated with the

appropriate array in the same form as that illustrated in the DiceDistribution of Example 7.3. Do not display element 0. Display and label ActNo.

SequentialSearchList1 ( List1, ActNo1, SoughtNum);

Function SequentialSearchList1 shall search List1 for the first occurrence of the SoughtNum. If found, the function will explicitly return the array subscript reflecting the location of SoughtNum within the array. The function will explicitly return 0 if not found.

# define MAX1 100

# define MAX2 8

main(int argc, char *argv[])

{

int

List1[MAX1+1], List2[MAX2+1],

ActNo1=0, ActNo2=0, SoughtNum, Location;

SoughtNum = List1[29];

FillListWithRandomIntegers (List1,&ActNo1,MAX1,50,1000,30000);

SoughtNum = List1[29];

DisplayList ( List1, ActNo1);

Location = SequentialSearchList1 ( List1, ActNo1, SoughtNum);

printf ("SoughtNum = %d Location = %d\n\n",SoughtNum,

Location);

SoughtNum = 100;

if (SequentialSearchList1 ( List1, ActNo1, SoughtNum))

printf ("SoughtNum = %d Location = %d\n\n",

SoughtNum,Location);

else

printf ("SoughtNum = %d Not Found! Location = %d\n\n",

SoughtNum,Location);
SoughtNum = 31000;

if (SequentialSearchList1 ( List1, ActNo1, SoughtNum))

printf ("SoughtNum = %d Location = %d\n\n",

SoughtNum,Location);

else

printf ("SoughtNum = %d Not Found! Location = %d\n\n",

SoughtNum,Location);
3. Complete the C program:

FillListWithRandomIntegers (List1,&ActNo1,MAX1,20,1,100);

Function FillListWithRandomScores shall place 20 random integers [range 1- 100] into array List1. Do not place more than MAX1 integers into array. Do not use element 0.

DisplayList ( List1, ActNo1);

Function DisplayList shall display the memory associated with the

appropriate array in the same form as that illustrated in the DiceDistribution of Example 7.3. Do not display element 0. Display and label ActNo.

SequentialSearchList2 ( List1, ActNo1, SoughtNum, &NoSearches);

# define MAX1 100

# define MAX2 8

main(int argc, char *argv[])

{

int

List1[MAX1+1], List2[MAX2+1],

ActNo1=0, ActNo2=0, SoughtNum, Location, NoSearches;

SoughtNum = List1[29];

FillListWithRandomIntegers (List1,&ActNo1,MAX1,50,1000,30000);

SoughtNum = List1[29];

DisplayList ( List1, ActNo1);

Location = SequentialSearchList1 (List1, ctNo1, SoughtNum, &NoSearches);