7 Missing Data in Software Engineering Multiple regression shows that the project size is an important predictor of the interval but none of the process coefficients are significant at the 10% level although a 5%
level is more commonly used, we chose to use a 10% level that is more suitable for the small sample size of our example and,
more importantly, to illustrate the differences among missing data methods. It is not too surprising, since more than a third of the observations were removed from the analysis.
Pairwise deletion or
available case method retains all non missing cases for each pair of variables. We need at least three variables for this approach to be different from listwise deletion. For example, consider the simplest example where the first of three variables are missing in the first case and the remaining cases are complete. Then, the sample covariance matrix would use all cases for the submatrix representing sample covariances of the second and third variables. The entry representing the sample variance of the first variable and sample covariances between the first and the remaining variables would use only complete cases.
More generally, the sample covariance matrix is:
sR Ry iiyiiyiiyiiR Rjkjkikijjikkkjiijik=
−
−
−
,
∑
∑
(
)(
)
1
where
yR Ry iiR Rjkiijikijiijik=
/
∑
∑
and
Rijand
Rikare indicators of missing values as defined in (2). Although such
method uses more observations, it may lead to a covariance matrix that is not positive-definite (positive-definite matrix has positive eigenvalues) and unsuitable for further analysis, i.e., multiple regression.
Share with your friends: