Two-way Chi-Squared Tests
The difference between relative frequency and overall frequency raises the need to test for differences in the two. This is where a (two-way) Chi-Squared test103 can be useful.
The Chi-Squared test compares the observed values of an n by k table to their expected values. The observed values are the observed frequencies of the intersection of two categories (represented by the row and column labels). In this case, the expected value for a cell of the table is the marginal percentage for the column applied to the row total.104 For example, in Table 1, the marginal percentage for OE column is approximately 14.4% (1,268/8,812). The row total for category A incursions is 132. Thus, the expected value for category A OE incursions is approximately 19 (.144 x 132). A generalized way to calculate the expected value is:
where:
Ei,j = Expected value for cell i, j
Oi,j = Observed value for cell i, j
N = Total observations
n = number of rows
k = number of columns
Constructing the expected values in this way is a test of independence between the rows and columns. That is, this tests for an association between the rows and columns. The test statistic is calculated by finding the difference between the observed and expected values for each cell, and then totaling them, shown formulaically as:
This test statistic is distributed Chi-Squared with degrees of freedom (n – 1)*(k – 1). In Table 1, this results in 6 degrees of freedom. Similar tests will be applied in the following sections regarding other combinations of variables.
Box and Whisker Plots
The box and whisker plot concisely presents the percentiles of the distribution and outliers. The core of this plot type is the box. The box represents the middle 50% of the distribution. The lower bound of the box represents the 25th percentile, the middle line represents the 50th percentile (or median), and the top of the box represents the 75th percentile. The second component of the plot type is the whiskers. These whiskers attempt to represent a “reasonable” range of the data. Specifically, the whiskers encompass the data that is within 1.5 times the interquartile range of the 25th and 75th percentiles. Data outside the whiskers are represented by dots, and are considered outliers. An annotated example follows.
Figure - Annotated Box and Whisker Plot
-
The Kruskal-Wallis test is an extension of the Mann-Whitney (or Wilcoxon) rank-sum test to two or more categories. The procedure for this test replaces each observation with its rank in the overall dataset and then calculates the mean rank for each category. This procedure jointly tests if the categories have statistically different mean ranks (i.e., if the ranks are distributed randomly among the categories). In other words, a significant test statistic indicates that the categories have different distributions of the continuous variable. This test is particularly useful for small samples, as it requires no asymptotic distributional assumptions. Because the test examines ranks rather than observed values, the exact distribution of the test statistic can be calculated. However, for data with several groups and a moderate number of observations in each group, the distribution is well approximated by the Chi-Squared distribution.105 More information on the calculations underlying the Kruskal-Wallis rank test can be found in Siegel & Castellan (1988).
Given that the Kruskal-Wallis test indicates that the groups are jointly significant, it may be interesting to determine which groups are in fact different. The mean ranks can be compared in a pairwise fashion to determine this. However, this introduces a significant statistical problem, multiple comparisons.
For example, if there are four groups to compare, there are 6 total pairwise comparisons. Suppose further that that standard significance level of 5% is assumed (i.e. the null hypothesis is rejected incorrectly 5% of the time). Lastly, for this example, suppose that none of the groups actually differ (i.e., the null hypothesis is true for all comparisons). Thus:
Thus, for six comparisons the likelihood of rejecting at least one null hypothesis when all are known to be true is greater than 25%. Put simply, even of all 4 groups are the same, there is a 25% probability of falsely identifying one difference as statistically significant. Therefore, a correction to the statistical significance criteria is required to compare the groups pairwise and avoid falsely identifying groups as significant.
A simple correction is to compare each test at a smaller significance level. The one employed in this analysis (referred to as the Bonferroni method) uses a pairwise significance rate of α/k, where α is desired significance level for the overall set of tests and k is the number of tests. This ensures that the overall false rejection rate among all the tests combined is no greater than the desired overall false rejection rate. Thus, in the above example, a pairwise significance level of .0083 (0.05 / 6) ensures that the overall false rejection rate is less than or equal to .05.106
Share with your friends: |