Ap statistics Unit 3: Examining Relationships 2b residual Plots

 Date 18.10.2016 Size 38.25 Kb. #2438

AP Statistics Unit 3: Examining Relationships 3.2B Residual Plots

Name_________________________ Date____________ Hour____________

1. In what way is a regression line a mathematical model?

We are trying to model a real world situation with an equation, and the equation of that regression line is what models the behavior of the data.

1. What is extrapolation and why is it dangerous?

We only really know the behavior of the data for the range that we gathered it for. Extrapolation means that you go outside the bounds of your data to predict. The trend that we see may not continue before or after the data we have collected.

1. What is a least-squares regression line?

A least squares regression line is a line that minimizes the sum of the squared distances from each point to the line.

1. What is the formula for the equation of the least-squares regression line?

Y=a+bx, where the line goes through the point and slope b =

1. The least-squares regression line always passes through the point …?

1. What is a residual?

The difference between what the model predicted would happen and what actually happened. Observed - Expected

1. How can you calculate residuals on your calculator and use this to produce a residual plot?

Do a linear regression. When you do the linear regression, it stores the residuals automatically in your LIST menu. Then highlight on of your lists, go to the LIST menu, select residuals and it will put it into the list. Then go to statplot and create a scatterplot using your explanatory variable list, and your residuals list.

1. If a least-squares regression line fits the data well, what characteristics should the residual plot exhibit?

It should have a scattered random look to it with no apparent pattern.

1. The table below gives data on the number of powerboats registered in Florida and the number of manatees killed by boats in the years from 1977 to 1990.

 Year PowerBoat Registrations (in thousands) Manatees killed 1977 447 13 1978 460 21 1979 481 24 1980 498 16 1981 513 24 1982 512 20 1983 526 15 1984 559 34 1985 585 33 1986 614 33 1987 645 39 1988 675 43 1989 711 50 1990 719 47

1. Use your calculator to make a scatterplot of these data. Sketch the graph below:

1. Find the equation of the least-squares regression line and overlay that line on your scatterplot.

Let y = number of manatees killed and x = number of powerboat registrations. The least-square regression equation is = − 41.43 + 0.1249x.

1. Predict the number of manatees that will be killed in a year when 716,000 powerboats are registered.

When 716,000 powerboats are registered, the predicted number of manatees killed will be −41.43 + 0.1249 × 716 = 47.99, or about 48 manatees.

1. Here are four more years of manatee data.

1991 716 53

1992 716 38

1993 716 35

1994 535 49
Add these points to your scatterplot. Florida took stronger measures to protect manatees during these years. Do you see evidence that these measures succeeded?

Yes, the measures seem to be succeeding, three of the four new points are below the regression line, indicating that fewer manatees than predicted were killed. Additional evidence of success is provided by the two points for 1992 and 1993; they fall well below the overall pattern.

e) In part (c) you predicted the number of manatee deaths in a year with 716,000 powerboat registrations. In fact, powerboat registrations were 716,000 for three years. Compare the mean manatee deaths in these three years with your prediction from part (c). How accurate was your prediction? The mean number of manatee deaths for the years with 716,000 powerboat registrations is 42. The prediction of 48 was too high.

1. The table below gives data on the number of new birds y and percent of returning birds x for 13 sparrowhawk colonies. Enter the data into your calculator.

 Percent returning New Adults 74 5 66 6 81 8 52 11 73 12 62 15 52 16 45 17 62 18 46 18 60 19 46 20 38 20

1. Use your calculator’s regression function to find the equation of the least-squares regression line.

The least squares regression line is = 31.9 − 0.304x . The calculator output (and Minitab output) is shown below:

1. Use your calculator to find the mean and standard deviation of both x and y and their correlation r.

The means, standard deviations, and correlation are: x = 58.23% , sx = 13.03% , y=14.23 newbirds, sy =5.29 newbirds, r=−0.748.

1. Calculate the slope b and y intercept a of the regression line following the method of Example 3.13 on page 208 in your book. Verify that your equation is the same as the one you obtained in part (a) except for slight rounding errors.

The slope is

b=−0.748 and the intercept is a= 14.23 − b×58.2331.9.

1. Explain in words what the slope and y-intercept of the regression line tell us.

The slope tells us that as the percent of returning birds increases by one the number of new birds will decrease by −0.304 on average. The y intercept provides a prediction that we will see 31.9 new adults in a new colony when the percent of returning birds is zero. This value is clearly outside the range of values studied for the 13 colonies of sparrowhawks and has no practical meaning in this situation.

1. An ecologist uses the line to predict how many birds will join another colony of sparrowhawks, to which 60% of the adults return from the previous year. What is the prediction?

The predicted value for the number of new adults is 31.9 − 0.304×60 = 13.69 or about 14.

1. A study recorded data on number of beers consumed and blood alcohol content (BAC) for 16 students. Page 212 in your book, problem 3.35 gives partial computer output from Minitab relating to these data:

1. Use the computer output to write the equation of the least-squares line.

Let y = Blood Alcohol Content (BAC) and x = Number of Beers. The least-squares regression line is = −0.0127 + 0.017964x .

1. Interpret the slope and y-intercept of the equation in this setting.

The slope indicates that on average, the BAC will increase by 0.017964 for each additional beer consumed. The intercept suggests that the average BAC will be −0.01270 if no beers are consumed; this is clearly ridiculous.

1. What blood alcohol level would your equation predict for a student who consumed 6 beers?

The predicted BAC for a student who consumed 6 beers is −0.0127 + 0.017964×6 = 0.0951.

1. The one student in the study who consumed 6 beers had a BAC of 0.10. What is your prediction error in part (c)?

The prediction error is 0.10 − 0.0951 = 0.0049.

1. Problem 3.37 in your book on page 213 gives a screen shot from a TI-84 produced using the data. Use the method described in example 3.13 on page 208 in your book to calculate the slope and y-intercept of the least-squares regression line. Compare with your results from part (a).

12. The Trans-Alaska Oil pipeline is a tube formed from ½ inch thick steel that carries oil across 800 miles of sensitive arctic and sub-arctic terrain. The pipe and the welds that join the segments were carefully examined before installation. How accurate are field measurements of the depth of small defects? Scatterplot below compares the results of measurements on 100 defects made in the field with measurements of the same defects made in the laboratory. The line y = x is drawn on the scatterplot. The second plot is a residual plot for these data.

1. Describe the overall pattern you see in the scatterplot, as well as any deviations from that pattern.

There is a positive, linear association between the two variables. There is more variation in the field measurements for larger laboratory measurements. The values are scattered above and below the line y = x for small and moderate depths, indicating strong agreement, but the field measurements tend to be smaller than the laboratory measurements for large depths.

1. If field and laboratory measurements all agree, then the points should fall on the y=x line drawn on the scatterplot, except for small variations in the measurements. But this is not the case. Explain.

The points for the larger depths fall systematically below the line y = x showing that the field measurements are too small compared to the laboratory measurements.

1. The line drawn on the scatterplot (y=x) is not the least-squares regression line. How would the slope and y-intercept of the least-squares line compare? Justify your answer.

In order to minimize the sum of the squared distances from the points to the regression line, the top right part of the blue line in the figure above would need to be pulled down to go through the “middle” of the group of points that are currently below the blue line. Thus, the slope would decrease and the intercept would increase.

d) Discuss what the residual plot tells you about how well the least-squares regression line fits the data.

The residual plot clearly shows that the prediction errors increase for larger laboratory measurements. In other words, the variability in the field measurements increases as the laboratory measurements increase. The least squares line does not provide a great fit, especially for larger depths.

1. Data on the fuel consumption y of a car at various speeds x is given on page 182 in your book. Fuel consumption is measured in liters of gasoline per 100 kilometers driven and speed is measured in kilometers per hour. A statistical software package gives the least-squares regression line and also the residuals. The regression line is:

The residuals in the same order as the observations are:

 10.09 2.24 -0.62 -2.47 -3.33 -4.28 -3.73 -2.94 -2.17 -1.32 -0.42 0.57 1.64 2.76 3.97

1. Make a scatterplot of the observations and draw the least-squares regression line on your plot.

A scatterplot with the least squares regression line is shown below.

1. Would you use the regression line to predict y from x? Justify your answer.

We would certainly not use the regression line to predict fuel consumption. The scatterplot shows a nonlinear relationship.

1. Check that the residuals have sum zero (up to round-off error)

The sum of the residuals provided is −0.01, which illustrates a slight roundoff error.

d) Make a plot of residuals against the values of x. Draw a horizontal line at height zero on your plot. Notice that the residuals show the same pattern about this line as the data points show about the regression line in the scatterplot in (a). What do you conclude about the residual plot?

The residual plot indicates that the regression line underestimates fuel consumption for slow and fast speeds and overestimates fuel consumption for moderate speeds. The quadratic pattern in the residual plot indicates that the regression model is not appropriate for these data.

15. Below are four sets of data prepared by the statistician Frank Ascombe to illustrate the dangers of calculating without first plotting the data.

1. Without making scatterplots, find the correlation and the least-squares regression line for all four data sets. What do you notice? Use the regression line to predict y for x = 10 for all four data sets.

1. Make a scatterplot for each of the data sets and add the regression line to each plot.

1. Now make a sketch of the residual plot for each of the four data sets

1. In each of the four cases, which would you be willing to use the regression line to describe the dependence of y on x? Explain our answer in each case.