# Vocabulary: Individuals & Variables

 Date 20.10.2016 Size 47.01 Kb. #5194
AP Statistics Name:________________________

Chapter 1 Outline Per:______________

 Vocabulary: *Individuals & Variables Categorical Variable Quantitative Variable *Distribution *Inference *frequency table *relative frequency table *Bar Graph Segmented Bar Graph *Pie Chart *Two-Way Table Marginal Distribution Conditional Distribution *Association *Shape Symmetric Skewed *Measures of Center Mean Median *Spread IQR Outliers *Four Step Process *Dotplot *Stemplot Splitting stems Back-to-back stemplot *Histogram *5-Number Summary *BoxPlot *Standard Deviation Variances

Homework Assignments:
 Date Assignment Weds 8/19 Intro #1-7 odd, 8 1.1 #11-25 odd Fri 8/21 1.2 #37-47 odd, 53-59 odd Mon 8/24 1.3 #79-97 odd skip 85, 103, 105, 107-110 Weds 8/26 Review Assignment Mon 8/31 Test

Introduction: Data Analysis: Making Sense of the Data

Example:

The following is a small section of a data set describing education in the US.

State Region Population(1000s) SAT Verbal SAT Math % taking % No HS

CA PAC 35,894 499 519 54 18.9

CO MTN 4,601 551 553 27 11.3

CT NE 3,504 512 514 84 12.5

Identify the individuals, and then identify the variables. Determine if each variable is categorical or quantitative.

1. Analyzing Categorical Data

What is the difference between a frequency table and a relative frequency table?

What is roundoff error?
 The Bar Graph Things to remember: Make sure you label your axes and title your graph Scale your axes appropriately Each bar should correspond to the appropriate count. Leave room between bars. The Pie Chart: Things to remember: Must include all the categories that make up the whole Counts will be percentages.

Example: Use the data to draw a bar graph AND a pie chart

Enrollment in Darien High School

Freshmen Sophomores Junior Seniors

340 320 409 389

When is it useful to use a bar graph?

When is it useful to use a pie chart?

Two-Way Tables and Marginal Distribution

Marginal Distribution:

Conditional Distribution:

Example: The table below gives information on the age and the number of school years completed for Americans (in thousands).

 Age Group Education 25 to 34 35 to 54 55 and over Total Did not Complete HS 5325 9152 16035 Completed High School 14061 24070 18320 1 to 3 years of college 11659 19925 9662 4 or more years of college 10342 19878 8005 Total

1) What is the total number of people described by this table?

2) What percentage of Americans did not finish high school? (marginal)

3) What percentage of Americans between the ages of 35 to 54 completed high school? (marginal)

4) What percentage of Americans who had between 1 and 3 years of college are over 55? (conditional)

5) What percentage of Americans who are between 25 and 34 years old had only completed high school? (conditional)

5) Is there a relationship between age and whether or not one completed 4 years of college? (conditional)

 Age Group 25-34 35-54 55 and over Percent with 4 years of college

6) Create a bar graph with the information in #5

1. Displaying Quantitative Data with Graphs

The Dotplot

Things to remember

• You only need a properly labeled horizontal axis

• Title the graph

• Each dot represents a count of 1

• Works well with a small data set

Example: Construct a Dotplot with the given information:

Runs Scored by the American league in the last 21 MLB All-Star Games

 0 3 6 9 7 2 13 4 9 4 7 3 5 2 7 4 7 13 5 2 4

When describing the overall pattern of a distribution, you MUST address the following 4 things.

1. The CENTER of the data

2. The SHAPE of the data

• Symmetric

• Skewed to the right

• Skewed to the left

1. The SPREAD of the Data

• Range

• IQR

4. Any OUTLIERS in the data

Example:

Describe the overall pattern of the distribution of runs scored by the American League in the example above.

The Stemplot

Things to remember

• Separate each piece of data into a stem (all but the rightmost digit) and a leaf (the final digit).

• Write the stems vertically in increasing order from top to bottom.

• Write the leaves in increasing order out from the stem

• Be very neat and make sure you leave the same amount of space in between leaves.

• Include a key identifying what the stem and leaves represent.

• Works well with a small data set

Example:

During the early part of the 2004 baseball season, many sports fans and baseball players noticed that the number of home runs being hit seemed to be unusually large. Here are the data on the number of home runs hit by American League teams.
American League 35, 40, 43, 49, 51, 54, 57, 58, 58, 64, 68, 68, 75, 77
Construct an appropriate graph to display the number of home runs hit in the American League.

American League Home Runs

When is it advantageous to split stems in a stemplot?

The Histogram

Things to remember:

• It is the most common graph of a quantitative variable.

• The x-axis is continuous, so there should be no gaps between the bars (unless a class has zero observations)

How to make a histogram:

• Divide the range of data into classes of equal width

• Find the count (frequency) or percent (relative frequency) of individuals in each class

• Label and scale your axes and draw histogram

Example: NBA Scoring Averages

The following table presents the average points scored per game (PTSG) for the 30 NBA teams in the 2009-2010 regular season. Create a frequency histogram and a relative frequency histogram

 Team PPG Team PPG Team PPG Atlanta Hawks 101.7 Indiana Pacers 100.8 Oklahoma City Thunder 101.5 Boston Celtics 99.2 Los Angeles Clippers 95.7 Orlando Magic 102.8 Charlotte Bobcats 95.3 Los Angeles Lakers 101.7 Philadelphia 76ers 97.7 Chicago Bulls 97.5 Memphis Grizzlies 102.5 Phoenix Suns 110.2 Cleveland Cavaliers 102.1 Miami Heat 96.5 Portland Trail Blazers 98.1 Dallas Mavericks 102 Milwaukee Bucks 97.7 Sacramento Kings 100 Denver Nuggets 106.5 Minnesota Timberwolves 98.2 San Antonio Spurs 101.4 Detroit Pistons 94 New Jersey Nets 92.4 Toronto Raptors 104.1 Golden State Warriors 108.8 New Orleans Hornets 100.2 Utah Jazz 104.2 Houston Rockets 102.4 New York Knicks 102.1 Washington Wizards 96.2

How is the stemplot of a distribution related to its histogram?

What is the difference between a bar graph and a histogram?

When is it better to use a histogram rather than a stemplot or dotplot?

1.3 Describing Quantitative Data with Numbers
Measuring Center with the Mean and Median
The Mean (“x bar”)

• The most common measure of center

• The mean is the arithmetic average.

• To find the mean of a set of observations you use the following formula: Important things to remember:

• Although most common, not always most appropriate measure

• Very sensitive to outliers

• If a distribution is skewed, the mean will not be an accurate measure of center

The Median M

• The _______________ of a distribution

• The measure of center which is resistant to _________________ and _____________________

• To find the median of a distribution:

1. Arrange observations in order from ______________ to ___________________

2. If the number of observations is odd, the median is the center observation in the ordered list

3. If the number of observations is even, the median is the average of the two center values in the ordered list

Comparing the Mean and Median

• The mean and median in a symmetric distribution will be very close to each other.

• If a distribution is exactly symmetric, the median and mean __________________________________

• If a distribution is skewed to the left, the mean will _________________________________________

• If the if the distribution is skewed to the right, the mean will __________________________________

Measuring the spread of a distribution
The Range:

• The difference between the largest and smallest observation (max – min).

• Useful if there are no outliers.

The Interquartile Range (IQR)

• The range of the middle __________________

• To calculate the quartiles:

1. is the median of the bottom half of the observations. It separates the bottom ___% of observations from the top ____%.

2. is the median of the top half of the observations. It separates the top ___% of observations from the bottom ___%.

3. IQR: _____________

Outliers: 1.5 x IQR Rule

An observation is an outlier if it falls:

• Less than __________________________________

• Higher than _____________________________________

Five Number Summary:

• _________________

• _________________

• _________________

• _________________

• _________________

Example:

Here is data for the amount of fat (in grams) for each of McDonald’s different chicken sandwiches:

16, 10, 20, 17, 28, 12, 23, 17, 17, 10, 16, 9, 15, 9
Find the mean, median, IQR, 5 number summary, and if there are any outliers.

BoxPlots:

• From the graph, we can see center, shape, and spread

• How to Construct a BoxPlot:

• A central box is drawn from the first quartile to the third quartile • A line in the box marks the median • Lines (whiskers) extend from the box out to the smallest and largest observations that aren’t outliers.

Example: Create a boxplot with the above data about McDonald’s chicken sandwiches.

How to find the standard deviation of n observations:

1. Find the distance of each observation from the mean and square each of these distances

2. Average the distances by dividing their sum by n-1

3. is the square root of this average squared distance Important note: The variance is the standard deviation squared (s²)
Choosing Measures of Center and Spread

• The median and IQR are used when describing __________________________________

• The mean and standard deviation are used when describing ___________________________