182 J. Rosenberg
Detecting
sampling bias can be difficult, because it typically happens before the data are collected. It can sometimes be spotted by the absence of certain kinds of data (customers from one region, service times longer than 1 month, etc, but usually must be identified by studying the documentation for the data collection processor interrogating the people who carry it out. Correcting sampling
bias is extremely difficult, since the basic problem is the complete lack of representation for some part of the population. To the extent that the type and degree of bias is known (also a difficult problem) it maybe possible to adjust for it, but generally the only solution is to make it clear just what subset of the population is described in the dataset. A good discussion of detecting and coping with overt and hidden biases can be found in Rosenbaum (As
should be clear from the above, problems of data quality are ubiquitous and difficult to deal with, particularly because there are only general
guidelines for what to do, and each case must be handled on its own terms.
7. SummaryThis chapter has discussed the role of the measurement process, the need for
metrics to be clearly defined, reliable, and valid in
order for them to be effective, and various statistical techniques and pitfalls in analyzing measurement data. Understanding measurement is a crucial part in the development of any branch of science (see Hand, 2004); the amount of effort devoted to it in empirical research in software engineering reflects the necessity of answering some of the most fundamental questions facing computer science and engineering. Fortunately, we can take advantage of the experience and knowledge
gained by other disciplines, and apply them with advantage in developing effective software measurement.
Share with your friends: