Statistics and the Common Core



Download 283.31 Kb.
Page6/6
Date18.10.2016
Size283.31 Kb.
#1333
1   2   3   4   5   6

Teacher Notes



Trick or Treat Activity

Big Ideas: “Statistics is the art of distilling meaning from data.” Data have a story to tell; we try to discover interesting facts hiding in the data.
Ask students to guess the number of MMs in a fun size bag BEFORE giving them the handout. Put all guesses on the board or have them submit via technology. THEN open your bag and tell them “this is the right answer.” Sometimes no one in the class has guessed the “right answer.” Hopefully some students will argue that your one bag may not really be the “right answer,” or that other bags might have a different number. This idea of naturally occurring variation is a good theme to develop. Students should expect there to be variation in lots of things in life. Statistics attempts to help people make sense and good decisions amidst this naturally occurring variation.
Population: All fun size bags produced by the Mars Company

Sample: Our class’s bags.

Observational Unit (subject): Each bag is an observational unit. The measurement on the observational unit/subject was “number of candies.” This is the response variable.

The “number of M&M’s/Skittles” might be best described by the “center” of a distribution. Depending on the shape of the distribution, this might best be described by the mean, median or the mode. The best answer may actually be a range/interval.

Comparing and contrasting: answers may vary, but the MM’s guesses typically will be a lot more variable and have a higher center than the actual counts. Also, the Skittles’ center should be less than the MM’s center.
Below is a set of distributions from recent activities:

Discussion questions: Could our sample of bags be “off” from the rest of the population? Perhaps all our bags were purchased at Wal-Mart. Would bags from Target be different? Will they be different next year?

(FYI: In 2008, the mean fun size bag in this activity was 24.5 MMs…)

Random Rectangles
Big Ideas: Justify predictions of shape, center and spread of data distributions and form, direction and strength of scatterplots.
Encourage students to make 4-5 different shapes of rectangles. Measure length and width in centimeters (to the nearest tenth). Then calculate the perimeter and area.

The best part of this lesson is asking students to predict the shapes of the distributions and scatterplots before actually graphing them. Allow students to offer explanations and reasoning for their opinions—great discussion can arise!


Instructors could have students graph all their data on the board or a poster, or enter into a spreadsheet or Fathom. Both would be more realistic than a computer simulation, but there is a good Fathom simulation of randomly created rectangles that can be used as a demonstration (and would be quicker!). Look for the “Rand.Rectangles (Fathom)" file on my web site under the Exploring Data page.
CensusAtSchool
Big Ideas: Learn to write good statistical questions that can be answered from data, gather a random sample, calculate appropriate statistics, and correctly interpret your findings.
Data from CensusAtSchool can be downloaded without signing up for an account. Just navigate to Random Sampler and you can select the type of sample you want. It downloads as a .csv file, which opens easily with Excel.
Teachers can register and set up classes for free. Go to amstat.org/censusatschool and navigate to the Teacher Section. Set up an account, then set up your class(es). Classes will be assigned an ID number, but you must create a password for each class.
The online survey will go faster if students complete the measurements sheet ahead of time. Safari also seems to work better than Firefox as a browser.
After your students take the survey, you will be able to download their data and analyze in class. Or you can just collect a random sample from the existing database for analysis.
CAS Project: (See http://www.amstat.org/education/webinars/ and scroll down to the “Cool Investigations…” webinar video for more detailed information about this project.)
This project can be done any time during the year, but students’ questions, analysis and interpretation will vary depending on how much knowledge they have of statistical calculations, graphs and thinking. For instance, students who have not completed a high school level course will not know about t-tests and confidence intervals, but they can think of great questions, make relevant graphs and state appropriate conclusions from their analyses.
From the GAISE document:

A major objective of statistics education is to help students develop statistical thinking. Statistical thinking, in large part, must deal with this omnipresence of variability; statistical problem solving and decision making depend on understanding, explaining, and quantifying the variability in the data. The “core skills” include the ability to:



Formulate Questions

Collect Data

Analyze Results

Interpret Results
Examples of good questions that can be answered with CAS data:

Do males have better reaction times than females?

Are male students taller than female students at every grade level?

Is the number of text messages sent daily related to the number of hours of sleep on a school night?

Who gets the most sleep on a school night, males or females?

Is favorite sport related to state of residence?

Which gender has the most variable foot sizes?

Is there an association between time spent with family and time spent doing homework?

Is there an association between height and reaction time?

How many more hours do girls spend texting than boys?

Does the state affect the students’ favorite type of music?

Do high school students have better reaction times than elementary students?

How many text messages do teenagers send each day?

Would students rather be rich or famous? Does this differ by age group? (elementary vs. high school)

Which states have the highest participation rates in this survey? Are all states represented?

Is gender related to choice of super power?

Is the number of hours spent studying related to the number of hours of sleep on a school night?

Is “Favorite Music” related to “Favorite Charity” among high school students?

Is there an association between internet use and hours of sleep?

What proportion of students are left-handed?


Here is a general rubric that can be used:
_____ Question

• Can be answered with CAS data

• Is clearly and succinctly stated

• Identifies population of interest


_____ Relevant data was gathered

• Sample size is large enough to answer question

• Sample is random

• Data is appropriate for question posed (gender, age, state(s), etc.)


_____ Analysis

• Data is “cleaned” for obvious mistakes, typos, impossible measurements, etc.

• Justification is provided for any data that is deleted.

• Appropriate visual display of data (simple, accurate, clear)


_____ Interpretation

• Summary and conclusion is correct and generalized to the correct population

• Interpretation is linked to data collected and analysis completed

• Conclusion is clear, concise, correct, complete and in context.


______ Self evaluation and reflection

• What difficulties did you encounter?

• What were the 2–3 biggest things you learned during the project (perhaps even things that went wrong that you would do differently next time)?

• What could have been done differently to improve the quality of this project?



Slope Interpretation
Big Idea: Know how to interpret slope as a rate of change in context.
Interpreting slope as a rate of change in context is a difficult skill for many students. Typically, slope in algebra does not have any context. It might be reasonable to add slope interpretation practice into algebra classes similar to these examples.
1a) P = -$1,500,000 + $250,000y
b) For each year AFTER the grand opening of a 1McD’s restaurant, there is a profit of $250,000 according to this model.
c) On the day of the grand opening (y = zero), the profit of a McD’s restaurant is predicted to be negative $1,500,000, according to this model.
2a) This model predicts that for each minute after launch, the balloon gains 18.15 feet per second.
b) This model predicts that the elevation of the launch site at NHS was 813 feet above sea level.
c) The actual elevation of the launch site is 784 feet above sea level. What accounts for this error? An altimeter produces elevation data based on barometric pressure. If the device is not calibrated for the current pressure at the time of launch, the elevation data will be erroneous.
3a) This model predicts that for each minute after launch, the longitude will increase by approximately 0.029 degrees.
b) This model predicts that the longitude of the launch site is 85.9995 degrees west of the Prime Meridian (which is correct).
c) WEST! Since the longitude numbers are increasing (becoming LESS negative), then the balloon is generally heading toward Greenwich!

Mean as Least Squares
Big Idea: Statistics is based on a lot of mathematical foundations, including the important idea of “least squares.”
This activity connects algebra and simulation to a unique and little-known fact about the mean of a set of data. Using a small data set, students discover this property of the mean.
This activity also previews the idea of “sum of squares,” which comes up in later statistics classes, including variance, standard deviation and analysis of variance (and in proofs of statistical theorems).
#1: The five differences are -4, -2, -1, 0, and 4. Squared: 16, 4, 1, 0, 16. Sum: 37

#2: Five differences: -3, -1, 0, 1, and 5. Squared: 9, 1, 0, 1, 25. Sum = 36



#3: The “guessed mean” of 4 gave the smallest sum of squares.
*There is a nice Fathom demo on my web site (“Mean.Least.Squares.ftm”) that will show the connection to parabolas and quadratic functions.
Here is the algebra way to find the equation of the parabola in the simulation.




or
The that produces the least S can be found at the vertex of the resulting parabola.
The x-coordinate is , so the vertex is (5.6, 37.2).
So = 5.6 and the minimum sum is 37.2.

Statistical Analysis from StatCrunch.com

Polynomial Regression Results (from the data produced by the Fathom simulation)
Dependent Variable: sum_squares
Independent Variable: possible_mean


Parameter

Estimate

Std. Err.

Alternative

DF

T-Stat

P-value

Intercept

194.00023

0.000016106871

≠ 0

768

12044563

<0.0001

X

-56.000075

0.0000052728295

≠ 0

768

-10620498

<0.0001

X^2

5.0000056

4.082734e-7

≠ 0

768

12246709

<0.0001


Summary of fit:
Root MSE: 0.00010969278
R-squared: 1
R-squared (adjusted): 1
Using the statistical analysis above, the three coefficients of the “polynomial of best fit” are 5.0, -56.0 and 194.0 (each with a slight bit of rounding error). Putting those together, we can write the quadratic function that basically fits the data perfectly: , whose vertex is (5.6, 37.2)

Measuring Lab and Other Data
Big Idea: Anything that is measured will have variation. Statistics is the art of finding the truth in the midst of variation.
Students will collect data on themselves for analysis and description.

Students will need rulers and an understanding of how to measure in centimeters.

Teachers may use this data in a variety of ways—I have only given a few suggestions.

(Many additional questions could be asked about the data collected in this activity.)

This would be a good time to introduce students to calculator or software use (graphing calculators, Fathom, Excel, StatKey, etc.)
Extensions: research DaVinci’s Vitruvian Man and other artists’ attention to detail about human body measurements and ratios (Michelangelo and others).
Q#1: Skewed right—there might be a few high # of siblings, but most will be 0-3.

Q#2: Perhaps a bimodal distribution by gender?

Q#3: Make a scatterplot and see what the slope is…

Q#5: When choosing the correct bow size (for hunters), a common calculation for the correct draw length is arm span ÷ 2.5.

Also, chair designers need to know the average knee height to make chairs comfortable, clothes designers need to know average arm lengths, leg lengths, etc.
PART 2:

The object of this portion of the activity is to demonstrate the general difficulty in measuring things. In Gauss’s day, this included astronomical measurements. This difficulty led Gauss to create the formula for the mathematical model that “fit” many measurement distributions of his time. Generally, he considered the mean of a distribution to be a rational choice to estimate the true value of a measurement only if this “normal law of errors” was used to describe the variation in the measurements (see en.wikipedia.org/wiki/Normal_distribution). Gauss’s formula is sometimes referred to as the Normal Model or the Normal Distribution Function. He and his formula were featured on Germany’s 10 Deutsche Mark.


For this activity with students, use simple objects, but ones in which it might be slightly difficult to measure: the length of the room, the diameter of a circular paper plate, the length of a piece of string, the height of a Styrofoam cup, etc.


Footnote: in trying to accurately measure the speed of light (which at one time was considered to be instantaneous), Albert A. Michelson used a set of spinning mirrors. Here’s what he concluded:
Final Result:

The final value of the velocity of light from these experiments is then—299940 kilometers per second, or 186,380 miles per second.




The mean value of V from the tables is

299852

Correction for temperature

+12




------------

Velocity of light in air

299864

Correction for vacuo

80




------------

Velocity of light in vacuo

299944±51

Question #8 and following: These are for practicing describing distributions (shape, center, spread, weird stuff/outliers, in context) as well as thinking about reasons why the data looks the way it does.


#12: This is a good distribution to play “20 Questions” with students. Give the class an opportunity to ask 20 yes-or-no questions to try to determine what this data set represents. The shape is important (two distinct modes—all other point are unusual, so you could make an argument that this distribution has both outliers and “IN-liers!”).

The distribution depicts the number of days in office of US Presidents. Obama’s dot is for 10/20/14, or 2099 days.
#13 This is a bimodal distribution. Most eruptions last around 90 seconds, but there is a smaller group of eruptions that last around 60 seconds.

Sources:


Speed of light measurements:

http://www.gutenberg.org/files/11753/11753-h/11753-h.htm

Gauss’s Normal Distribution:

http://en.wikipedia.org/wiki/Normal_distribution



Comparing Distributions:
Big Idea: Comparisons of distributions must include “comparative language” and include shape, center, spread, outliers and context.

According to the CCSS, students should be introduced to comparing two or more distributions in 7th grade. This is also a point of emphasis in the CCSS for high school students. In fact, it is considered a “widely applicable prerequisite” for college and career mathematics.


Sample answers:

1. The mean “average age at death” for both groups is around 75 years, but the range of average death ages is much larger for Milton (23 years compared to 9 years).


5. The median average winning speed of the winners of the Daytona 500 is higher than the winners of the Indy 500 (approx. 154 to 141mph). The range of the Indy 500 winning speeds is much larger than the Daytona winners (approx. 115mph to 50mph).

Scatterplot Interpretation:
Big Idea: Interpreting an association between two quantitative variables should include direction, form, strength, outliers and context.
This activity practices scatterplot interpretation. This is meant to be only a few examples of the hundreds that can be found on the internet and other sources. Many sports web sites have statistics that typically can be graphed with a scatterplot. Some examples:

nfl.com mlb.com amstat.org/censusatschool

wunderground.com/hurricane/hurrarchive.asp https://tuvalabs.com/explore/

lib.stat.cmu.edu/DASL/



Possible Answers:

1. Old Faithful isn’t faithful—at least in eruption durations. There is a lot of variation in the eruption durations over the first eight days of January, 2011 (Maybe it gets more consistent later? We don’t have that data.)

The general pattern is horizontal, with a range of 55 seconds to 112 seconds. Most eruptions are between 80 and 100 seconds. Most eruptions follow a longer-shorter-longer-shorter pattern, but this is not always the case. The longest eruption is followed by the shortest eruption. Perhaps there is a geological explanation for this—all it’s energy was spent after the long one?
Maybe the “faithful” moniker refers to the time between eruptions. From this graph, it appears that the gaps between the times is pretty consistent. So let’s look at graph #2.
2. The time between eruptions data appears to be bimodal. Generally there are shorter times (centered around 65 minutes) and longer times (centered around 95 minutes). The shortest interval is around 52 minutes and the longest interval is 121 minutes with an outlier just over 160 minutes. This outlier is either a geological anomaly, or there was a measurement skipped. If this outlier is divided by two, it fits nicely into the rest of the dataset. (Checking the dataset and online weather archives, this interval occurred between 1:29AM and 4:11AM on 3/13/11. There were no snowstorms in the area, the wind was calm, and no known electricity outages were noted. So we do not know for sure what happened, but it seems reasonable to believe that this interval is incorrect, and an eruption measurement was simply skipped.)
3. It seems that knowing the previous eruption time will help predict the wait time until the next eruption (≈80 minutes), but it is not as clear from the data how long the next eruption time will be (anywhere from 1.5 minutes to 5 minutes). There is more variation in the y-variable (called the response variable) in the fist graph than in the second graph. Therefore, predictions using the second graph should be more accurate.
4. There is a lot of variation in Old Faithful data! But rangers claim to have an accuracy rate of 90% in predicting when the next eruption will be. According to these data, they should be pretty accurate if they use the previous eruption duration to predict when the next eruption will take place. See: http://www.geyserstudy.org/geyser.aspx?pGeyserNo=OLDFAITHFUL

for more information.


5. There is a moderately strong, positive, nonlinear association between the year and the speed of the winning horses of the Kentucky Derby. The curve suggests that there is a physical limit to the average speed of a horse during the Kentucky Derby (an asypmtote!).
6. There is a bimodal, negative, strong association between the year and the winning time of Kentucky Derby races. The length of the race was shortened in 1896, which caused the gap in winning times.
7. There is a moderately strong, negative, nonlinear association between people per TV and life expectancy in these 22 countries. In other words, countries that have more TV’s per person have a higher life expectancy. This is probably due to the fact that countries with a lot of TV’s also have a higher socioeconomic status and thus, access to better medical care. TV’s probably do not cause people to live longer. This reinforces the truth that “association does not prove causation.”

8. Some possible statements:


In the North Atlantic, the number of tropical storms and hurricanes has largely been stable (no noticeable increase or decrease).
Generally the number of tropical storms has been declining in the Western Pacific from 1960 through 2010, and the number of hurricanes have been generally declining over the same period.


Dice and the Transitive Property
Big Idea: Probability models must be set up carefully--sometimes simulations are more helpful.
(To buy cool dice for this and other activities, see http://highland-games.co.uk/ This company is in Scotland, so prices are in pounds. Delivery should be within two weeks.)
(For another version of this activity, see plus.maths.org/content/non-transitiv-dice )
A standard six-sided die contains 1,2,3,4,5,6. This activity uses special non-standard dice that have the following numbering schemes: 4,4,4,4,4,1 3,3,3,3,3,6 2,2,2,5,5,5
Notice that the mean roll of each die is 21/6 or 3.5 per roll. Therefore, one might assume that if two players picked two dice at random, the chances of winning a roll would be even, or 50-50. Not true! Green beats blue more than half the time, blue beats black more than half the time, and black beats green more than half the time. This fact cannot be determined by simply calculating the mean of each die. The probability model used must contain all the possible differences of rolls when rolling the two dice.
For instance:
In Green vs. Blue, the probability models for the differences are:


Green Die

Blue Die

G–B

4

4

4

4

4

1

3

1

1

1

1

1

-2

3

1

1

1

1

1

-2

3

1

1

1

1

1

-2

3

1

1

1

1

1

-2

3

1

1

1

1

1

-2

6

-2

-2

-2

-2

-2

-5

G – B

-5

-2

1

Prob(G – B)

1/36

10/36

25/36

From the table, you can see that the Green die roll is larger than the Blue die in 25/36 rolls, which is more than 50% of the time. For the other two comparisons, the winning die has a 21/36 winning percentage.
Extension: It would be easy to dupe a friend with this set of dice. Let them pick a die first, then you pick the die that has an advantage. Of course, you will have to play this game many times for the truth to be revealed. In fact, you will lose a few games even if you choose the die that wins more in the long run.
Now here’s the really surprising result. Assume you were successful in duping a friend with this game and then had a “change of heart.” You reveal the secret of non-transitive dice and decide to let your friend choose second. And to make up for the earlier ruse, you tell you friend that we will roll TWO dice of the same color to give him twice the chance of winning. It turns out that when two roll two dice of the same color, the winning advantage is in REVERSE ORDER! Now Black beats Blue, Blue beats Green and Green beats Black (at least in the long run). You will still have the advantage, although you will not have as large a winning percentage as you did in the original problem.
Here’s the probability table/model for Doubling Rolls with Green and Blue:
Possible green dice sums: Possible blue dice sums:

 

4

4

4

4

4

1

4

8

8

8

8

8

5

4

8

8

8

8

8

5

4

8

8

8

8

8

5

4

8

8

8

8

8

5

4

8

8

8

8

8

5

1

5

5

5

5

5

2

 

3

3

3

3

3

6

3

6

6

6

6

6

9

3

6

6

6

6

6

9

3

6

6

6

6

6

9

3

6

6

6

6

6

9

3

6

6

6

6

6

9

6

9

9

9

9

9

12

Roll

8-6

8-9

8-12

5-6

5-9

5-12

2-6

2-9

2-12

Diff

2

-1

-4

-1

-4

-7

-4

-7

-10

Prob(Diff)

0.482

0.193

0.0193

0.1929

0.0772

0.0077

0.0193

0.0077

0.00077

There is a 48.2% chance of green beating blue, so there must be a 51.8% chance of blue beating green. Similar results would apply to the other dice comparisons. So the non-transitive direction has indeed been reversed!


These and other amazing results can be revealed through fun activities with dice and other manipulatives. Hopefully this sample activity will inspire other creative activities with probability and modeling random events.
The last problem comes from an AP Statistics exam problem from 1999. It involves rolling and comparing two non-standard die to see who has the advantage. This is an example of the importance of setting up the right model/table for the question asked. Many students only compared the average roll of each die and did not get the correct answer.
(The solution to this problem is provided on the next page…)
Solution to AP Statistics Exam Problem: 1999, #5.

Simulating Random Events
Big Idea: Simulating random events can be fun, and they provide valuable insight into the true nature of randomness.

Here is what a set of 50 flips might look like on a Stirling Recording Sheet:


In Statistics, probability is typically defined as the long-run frequency of success. Therefore, we expect a fair coin to land heads 50% of the time in the long run. After 50 flips, a reasonable person would say we would expect 25 to be heads, but we would not be surprised if it were 2-3 higher or lower. Also, if enough flips were conducted, it would not be unusual to see a streak of 10 or more heads (or tails) in a row. This can easily be verified with any good simulator (graphing calculator, Excel, random number table, etc.)



Going Deeper: As the number of flips increases, the percent heads should converge to 50%, but the number of heads will likely diverge away from half the total number of heads. For instance, in 10 flips, getting only 1 head would be unusual (9 away from the expected 5). But in 10,000 flips, getting 4950/10,000 would not be unusual at all (50 heads away from the expected 5000, but still 49.5% heads—really close to 50%).

#7-9: Spinning pennies will likely NOT average around 50%, especially early 1960’s pennies! Try to mix in a few of these pennies—some early 1960’s pennies land heads only about 10% of the time! Experts have surmised that the Lincoln head is heavier and “pulls” the coin over on Lincoln’s head, causing tails to be more likely.

See Gelman and Nolan's paper, "You Can Load a Die, But You Can’t Bias a Coin" at: http://www.stat.columbia.edu/~gelman/research/published/diceRev2.pdf
For an activity about a coin flipping scam: http://nrich.maths.org/6954

Jelly Blubbers
Big Idea: A good sample/survey can avoid bias by collecting a random sample.
This activity introduces the Simple Random Sample (SRS) to students, and shows why this process helps produce an unbiased sample statistic. Relying on our perceptions can often be deceiving.
Each student will need one “The JellyBlubber Colony” sheet, one dotplot sheet, and one metric ruler.
Instructions:

1. Pass out the Jelly Blubber Colony worksheet upside down. Ask students to not look at the sheet until they are instructed. Printing on thicker or colored paper can help hide the Blubbers.


Tell the students a story about the recently discovered colony of jellyblubbers, a new marine species, and that our task is to try to determine the average length in centimeters (measured horizontally) of jellyblubbers bv sampling.
For the first part of the activity, allow the students to look at the Colony for five seconds. They will then estimate the average (mean) length of the jellyblubbers. The teacher (or students) plot(s) the guesses as a dotplot on the board or a poster. Discuss the distribution, and calculate the mean and range (or standard deviation, depending on the level of students).
2. The students then are given one minute to choose a representative sample of 10 blubbers. Once they have made their choice, they measure the length of each blubber and calculate the mean length in centimeters. These values are plotted on a new dotplot, followed by a whole class discussion of the distribution.
3. In the third and final phase, students take a SRS of 10 blubbers, as follows. Each blubber is numbered from 1 to 100. They generate 10 random numbers from a random number table (using 01 through 00) or a calculator in the range 1 to 100. They calculate the mean length of those ten blubbers. These values are plotted on a new dotplot.
Discuss the difference in the three distributions – shape, center, spread, outliers.
Which method should give the best estimate? (SRS) What makes a sampling method the best? (low bias, low variation) What about the final question: a “closing your eyes and pointing” sample? It would be biased toward larger blubbers—you are more likely to point to a large one.
(The actual average length of a blubber is around 1.8 cm. There may be some variation in the size of blubbers due to variation in printing. If blubber #14 is 3.7 cm long, then the 1.8 cm mean should be accurate for the population.)
Alf Landon: Statistical Infamy
Big Idea: The Literary Digest poll of 1936 was one of the most famously biased surveys in history, falsely predicting Alf Landon’s landslide victory over FDR.
This activity begins with a research and investigative task about the 1936 Presidential election and two famous pre-election polls: one that got it right and one that missed it.
The second part of the activity is to demonstrate a Simple Random Sample: picking a random amount of words from an ordered list of words (the Gettysburg Address). Even the Gettysburg Address is difficult to “measure.” There were five official manuscripts of the address that Lincoln hand copied, and they apparently are not exactly the same. And since there was no audio recording, it is possible that Lincoln said different words when he recited the speech aloud at Gettysburg.
Alfred Mossman "Alf" Landon (September 9, 1887 – October 12, 1987), was an American Republican politician, who served as the 26th Governor of Kansas from 1933 to 1937. He was best known for having been the Republican Party's nominee for President of the United States, defeated in a landslide by Franklin D. Roosevelt in the 1936 presidential election.
The 1936 Presidential election was held on November 3, 1936. This election is notable for The Literary Digest poll, which was based on 10 million questionnaires mailed to readers and potential readers; 2.3 million were returned. The Literary Digest, which had correctly predicted the winner of the last 5 elections, announced in its October 31 issue that Landon would be the winner with 370 electoral votes. The cause of this mistake has often been attributed to improper sampling: more Republicans subscribed to the Literary Digest than Democrats, and were thus more likely to vote for Landon than Roosevelt. However, a 1976 article in The American Statistician demonstrates that the actual reason for the error was that the Literary Digest relied on voluntary responses. As the article explains, the 2.3 million "respondents who returned their questionnaires represented only that subset of the population with a relatively intense interest in the subject at hand, and as such constitute in no sense a random sample... it seems clear that the minority of anti-Roosevelt voters felt more strongly about the election than did the pro-Roosevelt majority."[2] A more detailed study in 1988 showed that both the initial sample and non-response bias were contributing factors, and that the error due to the initial sample taken alone would not have been sufficient to predict the Landon victory.[3] This mistake by the Literary Digest proved to be devastating to the magazine's credibility, and in fact the magazine went out of existence within a few months of the election.

That same year, George Gallup, an advertising executive who had begun a scientific poll, predicted that Roosevelt would win the election, based on a quota sample of 50,000 people. He also predicted that the Literary Digest would mis-predict the results. His correct predictions made public opinion polling a critical element of elections for journalists and indeed for politicians. The Gallup Poll would become a staple of future presidential elections, and remains one of the most prominent election polling organizations.

--Wikipedia
Other random sampling activities can be conducted on texts such as Bush’s and Obama’s inaugural addresses.
Also, there is an applet simulation of the Gettysburg Address activity at: http://www.rossmanchance.com/applets/GettysburgSample/GettysburgSample.html
The actual mean word length for the Bush/Obama question: Bush = 4.722, Obama = 4.642


Pick’s Theorem and Multiple Regression
Big Idea: The area of “dot grid rectangles” can be accurately predicted by knowing TWO explanatory variables.
This activity starts with a little-known Geometry theorem using “dot grid polygons.” Polygons are constructed on dot paper so that each vertex is one of the dots on the paper. It turns out that the area of the polygon can be found with a formula involving interior dots and border dots. This formula can be discovered by intelligent guessing and checking, but it can also be derived by using a statistical analysis called multiple regression. Multiple regression is a topic that is technically beyond the AP Statistics curriculum, although the principles underlying the procedure ARE in the AP Statistics curriculum. There was a question on the 2014 AP Statistics Exam (#6) that asked students to think about a multiple regression model for data, thus “stretching” them into a topic that was “just beyond” the AP curriculum.
There is more that you can do with this activity using statistical software such as Fathom. The entire activity including Fathom directions can be found on my web site under the Statistics and the Common Core tab. Look for “Pick’s Theorem Using Stats.”

Other resources:



NCTM’s Mathematics Teacher articles

Shaughnessy, J. Michael and Pfannkuch, Maxine. “How Faithful Is Old Faithful? Statistical Thinking: A Story of Variation and Prediction,” Mathematics Teacher, Vol. 95, No. 4, April 2002


Canada, Daniel L. “The Known Mix: A Taste of Variation,” Mathematics Teacher, Vol. 102, No. 4, November 2008
Websites

https://tuvalabs.com/explore/

http://highland-games.co.uk/

http://www.amstat.org/education/stew/index.cfm

http://cpid.iri.columbia.edu/index.html (ebola graphs and data)

More detailed descriptions of statistics concepts for Grades 6-8:

Kader, Gary. Developing essential understanding of statistics for teaching mathematics in grades 6-8, © 2013 NCTM: Reston, VA


“Progressions” and other documents from achievethecore.org
Other publications containing good activity ideas:

NCTM Navigations Series (Data Analysis, Probability), NCTM publications


Focus in High School Mathematics: Reasoning and Sense Making in Statistics and Probability, NCTM publication
Peck, Starnes, et. al., Making Sense of Statistical Studies, American Statistical Ass’n., 2009

Great reads about statistics in the world around us:

Huff, Daniel. How to Lie with Statistics, W.W. Norton and Co., New York (1954) $8 at Amazon


Paulos, John Allen, Innumeracy, Hill and Wang (1988)
Ellenberg, Jordan, How Not to be Wrong: The Power of Mathematical Thinking, The Penguin Press (2014)
Malcolm Gladwell’s “Trilogy”: Tipping Point, Blink, *Outliers; Little, Brown and Company, (2006, 2007, 2008)

*Outliers is particularly good…


Vickers, Andrew J., What is a P-value, Anyway?, Pearson: 2009

Appendices:
From NCTM’s Navigating through Data Analysis in Grades 9-12
Dietary Change and Cholesterol

Was the diet a success, or could decreases in cholesterol levels have been due merely to chance?


1. Write 2-3 questions that could be answered from this data set.

2. Write 2-3 questions that this data set could answer if someone asked YOU to change your diet to lower your cholesterol.

3. Answer one of the questions you wrote using graphs and statistics from the data. Explain and justify your conclusion.

5. To whom can conclusions from this data be generalized (applied)? Explain.

1. Cut out the rectangular shape of the helicopter on the solid lines.


Long rotor version pictured above

2. Cut one-third of the way in from each side of the helicopter to the vertical dashed lines on the solid line.


3. Fold both sides toward the center creating the base. The base can be stapled at the top and bottom. Try to be consistent about where the staples are placed. Use a paper clip to add some weight to the body.
4. For long-rotor helicopters, cut down from the top along the solid center line to the horizontal dashed line.
5. For short-rotor helicopters, proceed as in step 4, but cut the rotors off along the horizontal line marked.
6. Fold the rotors in opposite directions.

PAPER HELICOPTER DESIGN Original Design by George Box



1 This topic is a fairly deep extension of a nice Geometry activity.

Statistics and the Common Core Page October 20, 2014

Download 283.31 Kb.

Share with your friends:
1   2   3   4   5   6




The database is protected by copyright ©ininet.org 2024
send message

    Main page