7 Missing Data in Software Engineering The list of data quality problems in this example may seem enormous, but in our experience such data quality is not unusual in a software study.
We used
multiple linear regression see, for example, (Weisberg, 1985)] to model the project development interval. The project size and the three tracking measures were independent variables. We included the project size as a predictor because it affects the project interval.
Inspection of the variables showed increasing variances (a scatterplot with a very large density of points at low values) for the interval and size. A square root transformation was sufficient to stabilize the variance of the interval and size and led to the following final model:
Interval
Size
Tracking
Tracking
Tracking
Err
=
+
+
+
+
+
bbbbb0 1
2 1
3 2
4 o or. (The following section describes various techniques to fit such models in the presence of missing data.
Share with your friends: