Guide to Advanced Empirical

Download 1.5 Mb.

View original pdf

Page	137/258
Date	14.08.2024
Size	1.5 Mb.
	#64516
Type	Guide

1 ... 133 134 135 136 137 138 139 140 ... 258

2008-Guide to Advanced Empirical Software Engineering
3299771.3299772, BF01324126

6. Summary
It should be noted that the quality of collected data will have more influence on the analysis results and the success of a study than a choice of method to deal with missing values. In particular, a successful data collection might result in few or no missing values.
In many realistic scenarios the data quality is low, and some values are missing. In such cases, the first step should be to determine the mechanism by which the data are missing and add observations that may explain why the values are missing. This would make the MAR assumption more plausible. For MAR (and MCAR) data,
multiple imputation mitigates the effects of missing values. Other research and our case study have shown not only the importance of applying a missing data technique such as imputation, but also the importance of carrying out multiple imputation. In our case study we find that different conclusions maybe reached depending on the particular method chosen to handle missing data. This demonstrates that the selection of a proper method to handle missing data is not simply a formal exercise, but it may, in certain circumstances, affect the outcome of an empirical study.
References
Albrecht, A. J. & Gaffney Jr, J. E. (1983), Software function, source lines of code, and development effort prediction a software science validation, IEEE Transactions on Software
Engineering 9(6), An, K. H, Gustafson, DA Melton, AC, A model for software maintenance, in
Proceedings of the Conference in Software Maintenance, Austin, Texas, pp. Atkins, D, Ball, T, Graves, T. & Mockus, A. (1999), Using version control data to evaluate the effectiveness of software tools, in 1999 International Conference on Software Engineering,
ACM Press, Rio de Janeiro, Brazil, pp. 324–333.

7 Missing Data in Software Engineering Barnard, J. & Rubin, DB, Small sample degrees of freedom with multiple imputation,
Biometrika 86(4), 948–955.
Chidamber, SR Kemerer, CF, A metrics suite for object oriented design, IEEE
Trans. Software Eng. 20(6), Fleming, TH Harrington, D. (1984), Nonparametric estimation of the survival distribution in censored data, Communications in Statistics – Theory and Methods 20 13, 2469–2486.
Goldenson, DR, Gopal, A. & Mukhopadhyay, T. (1999), Determinants of success in software measurement programs, in Sixth International Symposium on Software Metrics, IEEE Computer Society Press, Los Alamitos, CA, pp. Graves, TL Mockus, A. (1998), Inferring change effort from configuration management databases, in Metrics 98: Fifth International Symposium on Software Metrics, Bethesda, MD, pp. Graves, TL, Karr, AF, Marron, J. S. & Siy, HP, Predicting fault incidence using software change history, IEEE Transactions on Software Engineering, 26(7), 653–661.
Halstead, M. H. (1977), Elements of Software Science, Elsevier North-Holland, New York.
Herbsleb, JD Grinter, R. (1998), Conceptual simplicity meets organizational complexity Case study of a corporate metrics program, in 20th International Conference on Software
Engineering, IEEE Computer Society Press, Los Alamitos, CA, pp. 271–280.
Herbsleb, JD, Krishnan, M, Mockus, A, Siy, HP Tucker, GT, Lessons from Ten
Years of Software Factory Experience, Technical Report, Bell Laboratories.
Jönsson, P. & Wohlin, C. (2004), An evaluation of k-nearest neighbour imputation using likert data, in Proceedings of the 10th International Symposium on Software Metrics, pp. 108–118.
Kaplan, E. & Meyer, P. (1958), Non-parametric estimation from incomplete observations, Journal
of the American Statistical Association, Kim, J. & Curry, J. (1977), The treatment of missing data in multivariate analysis, Social Methods
and Research 6, Little, R. J. AA test of missing completely at random for multivariate data with missing values, Journal of the American Statistical Association 83(404), Little, R. & Hyonggin, A. (2003), Robust likelihood-based analysis of multivariate data with missing values, Technical Report Working Paper 5, The University of Michigan Department of
Biostatistics Working Paper Series. http://www.bepress.com/umichbiostat/paper5
Little, R. J. A. & Rubin, DB, Statistical Analysis with Missing Data, Wiley Series in Probability and Mathematical Statistics, Wiley, New York.
Little, R. J. A. & Rubin, DB, The analysis of social science data with missing values,
Sociological Methods and Research 18(2), 292–326.
McCabe, TA complexity measure, IEEE Transactions on Software Engineering 2(4),
308–320.
Mockus, A. (2006), Empirical estimates of software availability of deployed systems, in 2006
International Symposium on Empirical Software Engineering, ACM Press, Rio de Janeiro, Brazil, pp. 222–231.
Mockus, A. (2007), Software support tools and experimental work, in V. Basili et al., eds,
Empirical Software Engineering Issues: LNCS 4336, Springer, pp. 91–99.
Mockus, A. & Votta, LG, Identifying reasons for software changes using historic databases, Technical Report BL, Bell Laboratories.
Myrtveit, I, Stensrud, E. & Olsson, U. (2001), Analyzing data sets with missing data an empirical evaluation of imputation methods and likelihood-based methods IEEE Transactions on
Software Engineering 27(11), 1999–1013.
Novo, A. (2002), Analysis of multivariate normal datasets with missing values, Ported to R by Alvaro A. Novo. Original by J.L. Schafer.
R Development Core Team (2005), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R- project.org
Roth, PL, Missing data a conceptual review for applied psychologist, Personnel
Psychology 47, 537–560.

200 A. Mockus
Rubin, DB, Multiple Imputation for Nonresponse in Surveys, Wiley, New York.
Schafer, J. L. (1997), Analysis of Incomplete Data, Monograph on Statistics and Applied Probability, Chapman & Hall, London.
Schafer, J. S. (1999), Software for multiple imputation. http://www.stat.psu.edu/Schafer, J. L. & Olsen, M. K. (1998), Multiple imputation for multivariate missing data problems,
Multivariate Behavioural Research 33(4), Strike, K, Emam, K. E. & Madhavji, N. (2001), Software cost estimation with incomplete data,
IEEE Transactions on Software Engineering 27(10), Swanson, E. B. (1976), The dimensions of maintenance, in Proceedings of the 2nd Conference on
Software Engineering, San Francisco, pp. 492–497.
Twala, B, Cartwright, M. & Shepperd, M. (2006), Ensemble of missing data techniques to improve software prediction accuracy, in ICSE’06, ACM, Shanghai, China, pp. 909–912.
Weisberg, S. (1985), Applied Linear Regression, 2nd Edition, Wiley, New York, USA.

Download 1.5 Mb.

Share with your friends:

1 ... 133 134 135 136 137 138 139 140 ... 258