Activity 3
Section 2: Understanding Assumptions and Common Statistical Strategies – Correlation, Regression, and Comparing Means
This section begins with exploring assumptions and why they are important (and what to do if your data do not meet required assumptions). Prior to conducting statistical tests you examine your dataset to ensure that it does not violate the assumptions upon which the intended tests are based. Using the procedures outlined in Section 1, you may already have a good idea about your dataset with regard to the necessary assumptions, however, in this section we will formalize the evaluation of these assumptions. In your dissertation it will be expected that you both understand and acknowledge assumptions, and that you are able to make modifications in your proposed analytical strategy, as necessary.
Once a firm understanding of assumptions related to statistical tests is gleaned, we jump into actually performing and interpreting common statistical tests; now the fun really begins!
The tests covered in this section include:
Correlation. Are two variables related? If so, how? A correlation tells you how and to what extent two variables are linearly related. A Correlation coefficient will always fall between -1 and +1 with 0 indicative of no relationship between the variables. Rule of thumb effect sizes are as follows: Small (+.1), Medium (+.3) and Large (+.5), although these effect sizes should always be evaluated relative to research. An important point to remember: correlation does not equal causation!
Regression. A regression analysis is very similar to a correlation, but is the framework commonly used when one wants to predict one variable from another. For example: How much variance in happiness scores are predicted by hours of physical activity performed each week? With the simple regression framework you have one predictor variable and one outcome variable and the outcome variable is measured on a continuous scale (soon you will learn how multiple regression can handle multiple predictor variables simultaneously).
Logistic Regression. A logistic regression is the framework one would use for prediction when the outcome variable is categorical. For example: Do numbers of hours spent in voluntary corporate training during the first year of employment predict whether an employee is still at the company in two years (yes/no).
Comparing Means and ANOVA. While many questions can be answered by correlation and regression, frequently questions require the comparison of mean scores. For example: Are standardized test scores higher in a school that uses one reading method compared to another? Do men or women reap a greater benefit, in terms of pounds lost, from a certain exercise program? Questions that compare two groups can be answered with a simple t-test. An Analysis of Variance (ANOVA) can handle designs that compare more than two groups, like: Does Drug A, B, or C result in better life expectancies for people diagnosed with cancer? Or does Diet A, B, C, or D result in better cholesterol levels?
A lot of information is covered in these chapters, so please plan accordingly. Also, pay attention to how these techniques are fundamentally similar – it seems like a ton of information, but if you master the statistical models at this level the rest of the course will be a breeze (well, nearly a breeze).
Activities #5 and #6 simply hit the high points, but you are expected to have gained an understanding of all analyses presented in the text. That is, should you require the use of an analytical strategy covered in the text but not performed in the Activity for your dissertation, you will have the core competencies to perform these alternative techniques.
A note about statistical significance (what it means/does not mean).
Most everyone appreciates a refresher on this topic.
Statistical Significance: An observed effect that is large enough we do not think we got it on accident (that is, we do not think that the result we got was due to chance alone).
How do we decide if something is statistically significant?
If H0 is true, the p-value (probability value) is the probability that the observed outcome (or a value more extreme than what we observe) would happen. The p-value is a value we obtain after calculating a test statistic. The smaller the p-value, the stronger the evidence against the H0. If we set alpha at .05, then the p-value must be smaller than this to be considered statistically significant; if we set alpha at .01, then it must be smaller than .01 to be considered statistically significant. Remember, the p-value tells us the probability we would expect our result (or one more extreme) GIVEN the null is true. If our p-value is less than alpha, we REJECT THE NULL HYPOTHESIS and say there appears to be a difference between groups/a relationship between variables, etc.
Conventional alpha (a) levels
p < .05 and p < .01
What do these mean?
p < .05 = this result would happen no more than 5% of the time (so 1 time in 20 samples), if the null were true.
p < .01 = this result would happen no more than 1% of the time (so 1 time in 100 samples), if the null were true.
Because these are low probabilities (events not likely to happen if the null were true), we reject the null when our calculated p-value falls below these alpha levels.
If the p-value is greater than alpha, you fail to reject the null. You never accept the null, simply fail to reject it. Failure to reject the null as false does not prove that it is true. It means simply that there is insufficient evidence to determine if the null if false or not; further research might be indicated.
What if your p-value is close to alpha, but slightly over it (like .056)? You cannot reject the null. However, you will want to look at your effect size to determine the strength of the relationship and also your sample size. Often, a moderate to large effect will not be statistically significant if the sample size is low (low power). In this case, it suggests further research with a larger sample.
Please remember that statistical significance does not equal importance. You will always want to calculate a measure of effect size to determine the strength of the relationship. Another thing to keep in mind is that the effect size, and how important it is, is somewhat subjective and can vary depending on the study at hand.
Required Reading:
Discovering Statistics Using SPSS: Preface, How to Use This Book, Chapters 5, 6, 7, 8, 9, 10
Self-Tests
Smart Alex's Quizzes
SPSS Data Sets:
Downloadfestival.sav
SPSSExam.sav
Chickflick.sav
Chamorro-Premuzic.sav
Activity 5a.sav
Activity 5c.sav
Optional Resources:
Interactive Multiple Choice Questions
Flashcards
Assignment 3 Understanding and Exploring Assumptions
Evaluation of Assumptions
In Activity #2, you used SPSS to create visual representations of your dataset. As you will see in Activity #3, each statistical procedure that you will use is based on one or more assumptions about the dataset. Prior to conducting statistical tests that will evaluate your hypotheses, you need to check your dataset to ensure that it does not violate the assumptions upon which the intended tests are based. Using the procedures outlined in Activity #2, you may already have a good idea about your data set with regard to the necessary assumptions. Now we will formalize the evaluation of these assumptions.
To Prepare for Activity #3:
Download SPSS Data Sets. The visual displays you will be asked to create as part of Activity #3 are ones you will work through in this chapter.
You will need to download the following data sets:
• Downloadfestival.sav
• SPSSExam.sav
• Chickflick.sav
Read Chapter 5 in the text. It will be to your advantage to have SPSS open on your computer as you work through chapter 5. While you are reading through this chapter and testing the assumptions of various statistical procedures, consider various types of datasets and whether they would run the risk of violating these assumptions.
Complete the Self-Tests within each chapter. Answers are available on the companion web site under the heading Additional Web Material in the Student Resource section (http://www.sagepub.com/field3e/additionalwebmaterial.htm).
Complete Smart Alex’s Quizzes. Be sure to take Smart Alex’s Quiz at the end of the Chapter and spend time learning the concepts related to questions you answered incorrectly. Answers are available at: http://www.sagepub.com/field3e/SmartAlexAnswers.htm
Optional Preparation for Activity #3
After completing the above activities, if you feel you need additional instruction on the concepts covered, please choose from any of the following activities that will assist you in mastering the core concepts.
Interactive Multiple Choice Questions. You might find it helpful to complete the multiple choice quizzes available at: http://www.sagepub.com/field3e/MCQ.htm
Flashcards. If what you need is gain a basic, definitional understanding of the topics, visit the Flashcard Glossary at: http://www.sagepub.com/field3e/Flashcard.htm
Activity #3
You will submit one Word document for this activity. You will create this Word document by cutting and pasting SPSS output into word.
1. Why do we care whether the assumptions required for statistical tests are met? (You might want to write your answer on a note card you paste to your computer).
2. Open the data set that you corrected in activity #2 for DownloadFestival.sav. You will use the following variables: Day1, Day2, and Day3 (hygiene variable for all three days). Create a simple histogram for each variable. Choose to display the normal curve (under Element Properties) and title your charts. Copy these plots into your Activity #3 Word document.
3. Now create probability-probability (P-P) plots for each variable. This output will give you additional information. Read over the Case Processing Summary. Notice that there is missing data for Days 2 and Day 3? Copy only the Normal P-P Plots into your Activity #3 Word document (you do not need to copy the beginning output nor the Detrended Normal P-P Plots).
4. Examining the histograms and P-P plots describe the dataset, with particular attention toward the assumption of normality. For each day, do you think the responses are reasonably normally distributed? (just give your impression of the data). Why or why not?
5. Using the same dataset, and the Frequency command, calculate the standard descriptive measures (mean, median, mode, standard deviation, variance and range) as well as kurtosis and skew for all three hygiene variables. Paste your output into your Activity #4 Word document (you do not need to paste the Frequency Table). What does the output tell you? You will need to comment on: sample size, measures of central tendency and dispersion and well as kurtosis and skewness. You will need to either calculate z scores for skewness and kutosis or use those given in the book to provide a complete answer. Bottom line: is the assumption of normality met for these three variables? Does this match your visual observations from question #2?
6. Using the dataset SPSSExam.sav, and the Frequency command, calculate the standard descriptive statistics (mean, median, mode, standard deviation, variance and range) plus skew and kurtosis, and histograms with the normal curve on the following variables: Computer, Exam, Lecture, and Numeracy for the entire dataset. Complete the same analysis using University as a grouping variable. Paste your output into your Activity #3 Word document (you do not need to paste the Frequency Table). What do the results tell you with regard to whether the data is normally distributed?
7. Using the dataset SPSSExam.sav, determine whether the scores on computer literacy and percentage of lectures attended (with University as a grouping variable) meet the assumption of homogeneity of variance (use Levene’s test). You must remember to unclick the “split file” option used above before doing this test. What does the output tell you? (be as specific as possible).
8. Describe the assumptions of normality and homogeneity of variance. When these assumptions are violated, what are your options? Are there cases in which the assumptions may technically be violated, yet have no impact on your intended analyses? Explain.
Submit your files in the Course Work area below the Activity screen.
Learning Outcomes: 3, 4, 8, 10
Assignment Outcomes
Calculate, integrate, and evaluate descriptive statistical analysis.
Create, integrate, and evaluate visual displays of data.
Analyze the assumptions required for valid inferential tests.
Demonstrate proficiency in the use of SPSS.
Share with your friends: |