AP Statistics Name:________________________
Chapter 1 Outline Per:______________
Vocabulary:
*Individuals & Variables
Categorical Variable
Quantitative Variable
*Distribution
*Inference
*frequency table
*relative frequency table
*Bar Graph
Segmented Bar Graph
*Pie Chart
|
*Two-Way Table
Marginal Distribution
Conditional Distribution
*Association
*Shape
Symmetric
Skewed
*Measures of Center
Mean
Median
*Spread
IQR
Outliers
|
*Four Step Process
*Dotplot
*Stemplot
Splitting stems
Back-to-back stemplot
*Histogram
*5-Number Summary
*BoxPlot
*Standard Deviation
Variances
|
Homework Assignments:
Date
|
Assignment
|
Weds 8/19
|
Intro #1-7 odd, 8
1.1 #11-25 odd
|
Fri 8/21
|
1.2 #37-47 odd, 53-59 odd
|
Mon 8/24
|
1.3 #79-97 odd skip 85, 103, 105, 107-110
|
Weds 8/26
|
Review Assignment
|
Mon 8/31
|
Test
|
Introduction: Data Analysis: Making Sense of the Data
Example:
The following is a small section of a data set describing education in the US.
State Region Population(1000s) SAT Verbal SAT Math % taking % No HS
CA PAC 35,894 499 519 54 18.9
CO MTN 4,601 551 553 27 11.3
CT NE 3,504 512 514 84 12.5
Identify the individuals, and then identify the variables. Determine if each variable is categorical or quantitative.
-
Analyzing Categorical Data
What is the difference between a frequency table and a relative frequency table?
What is roundoff error?
The Bar Graph
Things to remember:
-
Make sure you label your axes and title your graph
-
Scale your axes appropriately
-
Each bar should correspond to the appropriate count.
-
Leave room between bars.
|
The Pie Chart:
Things to remember:
-
Must include all the categories that make up the whole
-
Counts will be percentages.
|
Example: Use the data to draw a bar graph AND a pie chart
Enrollment in Darien High School
Freshmen Sophomores Junior Seniors
340 320 409 389
When is it useful to use a bar graph?
When is it useful to use a pie chart?
Two-Way Tables and Marginal Distribution
Marginal Distribution:
Conditional Distribution:
Example: The table below gives information on the age and the number of school years completed for Americans (in thousands).
|
|
Age Group
|
|
|
Education
|
25 to 34
|
35 to 54
|
55 and over
|
Total
|
Did not Complete HS
|
5325
|
9152
|
16035
|
|
Completed High School
|
14061
|
24070
|
18320
|
|
1 to 3 years of college
|
11659
|
19925
|
9662
|
|
4 or more years of college
|
10342
|
19878
|
8005
|
|
Total
|
|
|
|
|
1) What is the total number of people described by this table?
2) What percentage of Americans did not finish high school? (marginal)
3) What percentage of Americans between the ages of 35 to 54 completed high school? (marginal)
4) What percentage of Americans who had between 1 and 3 years of college are over 55? (conditional)
5) What percentage of Americans who are between 25 and 34 years old had only completed high school? (conditional)
5) Is there a relationship between age and whether or not one completed 4 years of college? (conditional)
Age Group
|
25-34
|
35-54
|
55 and over
|
Percent with 4 years of college
|
|
|
|
6) Create a bar graph with the information in #5
-
Displaying Quantitative Data with Graphs
The Dotplot
Things to remember
-
You only need a properly labeled horizontal axis
-
Title the graph
-
Each dot represents a count of 1
-
Works well with a small data set
Example: Construct a Dotplot with the given information:
Runs Scored by the American league in the last 21 MLB All-Star Games
-
0
|
3
|
6
|
9
|
7
|
2
|
13
|
4
|
9
|
4
|
7
|
3
|
5
|
2
|
7
|
4
|
7
|
13
|
5
|
2
|
4
|
|
When describing the overall pattern of a distribution, you MUST address the following 4 things.
-
The CENTER of the data
2. The SHAPE of the data
-
Skewed to the right
-
Skewed to the left
-
The SPREAD of the Data
4. Any OUTLIERS in the data
Example:
Describe the overall pattern of the distribution of runs scored by the American League in the example above.
The Stemplot
Things to remember
-
Separate each piece of data into a stem (all but the rightmost digit) and a leaf (the final digit).
-
Write the stems vertically in increasing order from top to bottom.
-
Write the leaves in increasing order out from the stem
-
Be very neat and make sure you leave the same amount of space in between leaves.
-
Title your graph
-
Include a key identifying what the stem and leaves represent.
-
Works well with a small data set
Example:
During the early part of the 2004 baseball season, many sports fans and baseball players noticed that the number of home runs being hit seemed to be unusually large. Here are the data on the number of home runs hit by American League teams.
American League 35, 40, 43, 49, 51, 54, 57, 58, 58, 64, 68, 68, 75, 77
Construct an appropriate graph to display the number of home runs hit in the American League.
American League Home Runs
When is it advantageous to split stems in a stemplot?
The Histogram
Things to remember:
-
It is the most common graph of a quantitative variable.
-
The x-axis is continuous, so there should be no gaps between the bars (unless a class has zero observations)
-
Title your graph
How to make a histogram:
-
Divide the range of data into classes of equal width
-
Find the count (frequency) or percent (relative frequency) of individuals in each class
-
Label and scale your axes and draw histogram
Example: NBA Scoring Averages
The following table presents the average points scored per game (PTSG) for the 30 NBA teams in the 2009-2010 regular season. Create a frequency histogram and a relative frequency histogram
Team
|
PPG
|
Team
|
PPG
|
Team
|
PPG
|
Atlanta Hawks
|
101.7
|
Indiana Pacers
|
100.8
|
Oklahoma City Thunder
|
101.5
|
Boston Celtics
|
99.2
|
Los Angeles Clippers
|
95.7
|
Orlando Magic
|
102.8
|
Charlotte Bobcats
|
95.3
|
Los Angeles Lakers
|
101.7
|
Philadelphia 76ers
|
97.7
|
Chicago Bulls
|
97.5
|
Memphis Grizzlies
|
102.5
|
Phoenix Suns
|
110.2
|
Cleveland Cavaliers
|
102.1
|
Miami Heat
|
96.5
|
Portland Trail Blazers
|
98.1
|
Dallas Mavericks
|
102
|
Milwaukee Bucks
|
97.7
|
Sacramento Kings
|
100
|
Denver Nuggets
|
106.5
|
Minnesota Timberwolves
|
98.2
|
San Antonio Spurs
|
101.4
|
Detroit Pistons
|
94
|
New Jersey Nets
|
92.4
|
Toronto Raptors
|
104.1
|
Golden State Warriors
|
108.8
|
New Orleans Hornets
|
100.2
|
Utah Jazz
|
104.2
|
Houston Rockets
|
102.4
|
New York Knicks
|
102.1
|
Washington Wizards
|
96.2
|
How is the stemplot of a distribution related to its histogram?
What is the difference between a bar graph and a histogram?
When is it better to use a histogram rather than a stemplot or dotplot?
1.3 Describing Quantitative Data with Numbers
Measuring Center with the Mean and Median
The Mean (“x bar”)
-
The most common measure of center
-
The mean is the arithmetic average.
-
To find the mean of a set of observations you use the following formula:
Important things to remember:
-
Although most common, not always most appropriate measure
-
Very sensitive to outliers
-
If a distribution is skewed, the mean will not be an accurate measure of center
The Median M
-
The _______________ of a distribution
-
The measure of center which is resistant to _________________ and _____________________
-
To find the median of a distribution:
-
Arrange observations in order from ______________ to ___________________
-
If the number of observations is odd, the median is the center observation in the ordered list
-
If the number of observations is even, the median is the average of the two center values in the ordered list
Comparing the Mean and Median
-
The mean and median in a symmetric distribution will be very close to each other.
-
If a distribution is exactly symmetric, the median and mean __________________________________
-
If a distribution is skewed to the left, the mean will _________________________________________
-
If the if the distribution is skewed to the right, the mean will __________________________________
Measuring the spread of a distribution
The Range:
-
The difference between the largest and smallest observation (max – min).
-
Useful if there are no outliers.
The Interquartile Range (IQR)
-
The range of the middle __________________
-
To calculate the quartiles:
-
is the median of the bottom half of the observations. It separates the bottom ___% of observations from the top ____%.
-
is the median of the top half of the observations. It separates the top ___% of observations from the bottom ___%.
-
IQR: _____________
Outliers: 1.5 x IQR Rule
An observation is an outlier if it falls:
-
Less than __________________________________
-
Higher than _____________________________________
Five Number Summary:
-
_________________
-
_________________
-
_________________
-
_________________
-
_________________
Example:
Here is data for the amount of fat (in grams) for each of McDonald’s different chicken sandwiches:
16, 10, 20, 17, 28, 12, 23, 17, 17, 10, 16, 9, 15, 9
Find the mean, median, IQR, 5 number summary, and if there are any outliers.
BoxPlots:
-
From the graph, we can see center, shape, and spread
-
How to Construct a BoxPlot:
-
A central box is drawn from the first quartile to the third quartile
-
A line in the box marks the median
-
Lines (whiskers) extend from the box out to the smallest and largest observations that aren’t outliers.
Example: Create a boxplot with the above data about McDonald’s chicken sandwiches.
How to find the standard deviation of n observations:
-
Find the distance of each observation from the mean and square each of these distances
-
Average the distances by dividing their sum by n-1
-
is the square root of this average squared distance
Important note: The variance is the standard deviation squared (s²)
Choosing Measures of Center and Spread
-
The median and IQR are used when describing __________________________________
-
The mean and standard deviation are used when describing ___________________________
Share with your friends: |