1 Software Engineering Data Collection for Field Studies
25
3.3.1. Analysis of Electronic Databases of Work PerformedIn most large software
engineering organizations, the work performed by developers is carefully managed using issue tracker, problem reporting, change request and configuration management systems. These systems require software
engineers to input data, such as a description of a problem encountered, or a comment when checking in a source code module. The copious records generated for such systems area rich source of information for software engineering researchers. Besides the examples provided below, seethe proceedings from the International Workshops on Mining Software Repositories.
Advantages: A large amount of data is often readily available. The data is stable and is not influenced by the presence of researchers.
Disadvantages: There maybe little control over the quantity and quality of information manually entered about the work performed. For example, we found that descriptive
fields are often not filled in, or are filled indifferent ways by different developers. It is also difficult to gather additional information about a record, especially if it is very old or the software engineer who worked on it is no longer available.
Examples: Work records can be used in a number of ways. Pfleeger and Hatton (1997) analyzed reports of faults in an air traffic control system to evaluate the effect of adding formal methods to the development process. Each module in the software system was designed using one of three formal methods or an informal method. Although the code designed using formal methods tended to have fewer faults, the results were not compelling even when combined with other data from a code audit and unit testing.
Researchers at NASA (1998) studied data from various projects in their studies of how to effectively use COTS (commercial off-the-shelf software) in software engineering. They developed an extensive report recommending how to improve processes that use COTS.
Mockus et al. (2002) used data from email archives (amongst a number of different data sources) to understand processes in open source development.
Because the developers rarely, if ever, meet face-to-face, the developer email list contains a rich record of the software development process. Mockus et al. wrote Perl scripts to extract information from the email archives. This information was very valuable in helping to clarify how development in open source differs from traditional methods.
Reporting guidelines: The exact nature of the collected
data needs to be specified, along with any special considerations, such as whether any data is missing, or unin- terpretable for some reason.
Additionally, any special processing of the data needs to be reported, such as if only a certain proportion is chosen to be analysed.
Share with your friends: