Software LifeCycle Group Introduction Service oriented architectures (SOA) are becoming more and more popular with many companies and entire industries turning to them in an attempt to build more light-weight and focused applications running on and across all platforms, reducing their time to market, and eliminating incompatibility issues. Flexibility of SOA is the first of the key benefits that most often get cited by proponents of SOA, with the opportunity to reuse business components being the second one. Along with all the benefits that the flexibility and the reuse of business components bring, there are also new challenges and risks that emerge with SOA applications. Issues of correctness, robustness, security and performance are brought to the forefront when dealing with service oriented architectures. Unlike traditional monolithic applications, SOA applications are typically intended to be deployed on the Internet with, potentially, millions of users accessing them. Because of the mission-critical aspect of the corresponding business transactions, the issue of risk compounds with external reuse. A down time of an hour not only can cost substantial losses in revenue but, more importantly, can foster the perceived lack of quality and reliability in the company in general. Flexibility of SOA also brings another issue to the forefront, the usage above and beyond initial estimates. Because of its light-weight characteristics, an ability to run on and across all platforms, and universal availability due to its deployment on the Internet, a SOA application can experience a sudden surge in usability. Whether your application can handle usage outside the original scope has to be thoroughly tested, to avoid losing the momentum with its users. To answer this question effectively, it is vital that proper load testing is performed to ensure that quality-of-service agreements are met in light of anticipated and unanticipated usage as well. Finally, when web services are standardized upon within a certain industry, with many different vendors implementing the same service with a fixed interface (think Location-Based Services, for example), the main if not the only way to compete for customers is to deliver a faster, more efficient, and highly available web service, capable of handling higher loads than the other web services on the market. In this paper we discuss best strategies and guidelines for effective load and performance testing of web services applications. We focus on various types of tests that a good performance testing suite typically encompasses. We also discuss how to make sure that these tests adequately measure performance characteristics of the web service under test as well as lessen the impact of the observer effect .
Why Load Testing Web Services need to support multiple users concurrently accessing them. Therefore, during testing, it is important to evaluate a system’s ability to perform critical tasks during periods of peak activity. With web services deployed on the Internet, it’s often impossible to accurately predict the workload. A web service can become very popular and the number of concurrent requests per second can surge above and beyond the predicted levels. Therefore, a testing plan for web services should incorporate tests that reveal what happens when the system reaches its saturation point. With web services deployed on the Internet, clients can potentially access them from all over the world. Network latency and throughput have to be taken into account when evaluating the performance of web services. Service Level Agreements (SLAs) for web services typically require that the web services be available 95% to 99.99% of the time. A testing plan for web services should therefore incorporate tests that run for prolonged periods of time and check whether system performance degrades with time. Load, performance, and stress testing address all these issues. These tests help evaluate connect time, response time, throughput, failure rate collectively known as performance characteristics at various load levels. They show how the system behaves when it reaches the saturation point and whether it can recover back to normal performance levels once the load decreases. In addition, load, performance, and stress tests can identify system errors such as memory runtime problems, database deadlocks, multithreading problems, hardware failures causing software failures, etc.
Objectives of Load and Performance Testing There are general questions that load and performance testing aims to answer:
Does the system’s performance satisfyits requirements?
How does the system perform at various load levels?
Will the system handle increase in Web traffic without compromising response time, reliability and accuracy?
At which point will the performance degrade and where’s the bottleneck? When and how will the system recover?
Does the system performance degrade if the web service is run for an extended period of time?
Are there slow memory leaks that are not noticeable during standard, shorter lasting tests?
How can we reduce the observer effect to get the most accurate results possible?
Just like unit and functional tests, when introduced early in the development lifecycle, help avoid costly errors at the later stages in the development process, load testing early and often will help address and eliminate dangerous performance problems. Performance related errors can be caught soon after being introduced and can be addressed immediately without being propagated to the next stage in the development, when they will be more difficult to detect. When new features introduce a performance bottleneck, their usefulness can be balanced against the drop in performance and the guesswork is again eliminated at the stage when the integration is complete. As pointed out in , a good load test needs to test a realistic scenario and be able to validate a web services response. It’s not enough just to pound each operation in a web service with requests and ignore the response. There is a big difference in performance between properly processing a database update request and rejecting input resulting in sending back a SOAP Fault message. If we are testing a stateful
web service where a user engages in an interactive session with a web service, such as buying an item in an on-line store, an ability of a load test to simulate a usage scenario and validate intermediate responses becomes indispensable. It distinguishes a truly useful load test from a nominal one. Last but not least, with web services being deployed on the Internet, users can potentially access them from all over the country and beyond. A realistic load test should evaluate the performance on a server but also estimate a performance drop introduced by network latency, to determine if all potential users will achieve adequate levels of service or if web services infrastructure needs to be supplemented by introducing additional nodes. To achieve this, load needs to be pushed to agents located in designated physical areas and performance characteristics need to be evaluated for various typical infrastructure installations.
To summarize, a load and performance test plan has to meet the following objectives:
Need to determine acceptable performance as various load levels as determined by marketing research and as stated in a LSA
Need to determine a projected number of users, their roles and typical activities
Types of Tests Testing of multi-user support capabilities generally encompasses three types of tests: load, performance and stress tests. Load testing focuses on testing system performance with a predefined load level. Because the objective of this test is to determine whether the system’s performance satisfies its requirements, it is necessary to define the maximum activity levels and the minimum configuration of resources before the testing begins. Performance testing focuses on measuring system performance at various load levels and is used to predict when load levels will exhaust system resources. Stress testing pushes the system beyond predefined operational limits to see how the system behaves and if the system recovers well. Part of stress testing is running the test for extended periods of time to see if performance degrades. Typically, load testing, performance testing and even stress testing are used interchangeably. The similarities between these tests stem from the similarities in the execution strategies. For all kind of tests, you need to simulate a certain number of users simultaneously accessing the service for a certain period of time. Various scenarios of testing such as bell, buffer, steady load, and linear increase can be are used to set up a load test, a performance test, and a stress test.
Finally, a special kind of testing, known as regression testing, insures that once the system is changed, it still performs according to its specifications, meeting or exceeding them. Regression testing should be completely automated to allow the tester to focus on creating useful and effective load test scenarios instead of spending the time running these tests.
Once the load testing plan has been established, it is time to translate that into an effective test suite. Below we list the steps that will help you achieve this goal.
Simulate the real world usage by reusing functional scenarios defined during functional testing
When evaluating the performance of a web service, it is insufficient to just send a request and measure the time it takes for a web service to come back with a response. If an operation barely rejects input and sends back a SOAP Fault message, the time it takes is significantly different from the time it takes to properly process a request and return a (potentially large) valid response. If a load test does not check the outcome of an operation, it will provide an inaccurate view of the true system performance. For stateful web services that maintain a session state between user requests, it’s really important that load testing is performed against a usage scenario simulating a realistic user interaction with a web service rather than a disjoint collection of individual method calls. It is also necessary to fine tune virtual users simulating real users by adjusting factors like “think time”.
With web services deployed on the Internet, clients can potentially access them from all over the world. When evaluating the performance of web services it is therefore necessary to correctly model and measure network latency and throughput. To achieve that, a load test has to simulate usage across different geographic locations by distributing load generation through multiple machines set in the corresponding target locations. Alternatively,
to simulate the effect of machines being distributed across varying distances, requests originating from each load generator can be assigned a different hits per second count (a higher hits/second measurement indicating that the requests are coming from a closer machine).
Perform load tests using steady load scenario
To measure system performance at a predefined load level, a steady load test scenario can be used. An example of such a test is shown in Figure 1.
Stress tests push the system beyond predefined operational limits to see how the system behaves and if the system recovers well. Two types of classic test scenarios can be used to set up a stress test – linear increase and bell. A linear increase test scenario shown in Figure 2 addresses stress testing as it pushes the system beyond predefined operational limits and allows us to predict when load levels will exhaust system resources.
Figure 2. Example of a linear increase scenario. A bell test checks when a system exhausts its resources and also checks how the system behaves at that point and if it recovers well when the load subsides. An example of a bell test scenario is shown in Figure 3.
Figure 3. Example of a bell test scenario. Part of stress testing includes running the test for extended periods of time to see if performance degrades. A steady load scenario (Figure 1) and a buffer test scenario (Figure 4) run for prolonged periods of time can be used to achieve this goal.
Figure 4. Example of a buffer test scenario.
Carry out performance tests using bell and linear increase scenarios
A linear increase test scenario (Figure 2) addresses both stress testing and performance testing, as it allows us to predict when load levels will exhaust system resources. A bell test scenario (Figure 3) can also be used for performance testing to evaluate how system resources are distributed as load increases and also how well the system recovers as load decreases.
Lessen the observer effect
The observer effect describes the potential of observation/measurement to affect the results in a controlled environment to the point where they don’t truly reflect real-world performance. In load testing, this can occur if the machine creating the load or undergoing the tests is also performing the testing measurements. If the CPU or memory is at maximum capacity then the limits of the machine have been reached, but not necessarily the limits of the application. This can obviously skew the results. To mitigate this, it’s important to devise a testing architecture that reduces the influence. In our experience, an effective way of arranging a testbed is as follows. Suppose that we have a web services server that we are testing, with a monitor, and one or more remote load generators, controlled by a master. The load generators are distributed in the real world across different networks. They send in a variety of requests to test out different predefined scenarios. The monitoring machine is set up on the same local network as the server machine and is generating a negligible load so that processor and memory usage remains low. Independent of the rest of the framework, this machine is the primary reporting device and relates the most accurate data.
Figure 5. Example test bed architecture. Note from the diagram that a single machine – master - has been set up to remotely orchestrate the load tests on the load generator machines. To further simulate the effect of machines being distributed across varying distances, we have assigned a different hits per second count for the requests originating from each load generator (a higher hits/second indicating that the requests are coming from a closer machine). The server under observation includes functional test scenarios modeling typical use cases. After running the test, we draw our results from the monitor machine to determine whether the server could withstand the load from the generator machines. The remote CPU/memory monitor, set up on the monitor machine to observe the server, is able to determine independently whether resources are being overextended on the server; the errors (or lack thereof) that are being reported are also clues as to whether the server is handling the load gracefully.
Monitor Quality Of Service parameters
To insure that performance targets are met, you should gather Quality Of Service data. Quality Of Service parameters can measure whether a specific percentage of all hits execute under a specific amount of time, or whether a certain percentage of requests result in an error response, for example.
Report your findings
Last but not least, a test suite has to be supported by extensive reporting capabilities that organize and present the information gathered during load testing into a comprehensive report or a series of reports. Without this capability, valuable insight into the performance of the system and the factors that contribute to poor performance, if any, will be wasted.
The importance of load testing cannot be overstated, given the growing adoption of public-facing web services. With acceptable response times hovering around 1 second before user experiences begin to degrade , it is crucial that web services operate as efficiently as possible. This is compounded by the reality that malicious attacks on web-servers, such as Denial of Service attacks (DOS), are a common occurrence and continue increasing. DOS attacks can cripple a web server and without performing proper load testing, it is difficult to know if a web server is vulnerable to this kind of an attack . Proper load testing can identify weak points in the hardware infrastructure and the software implementation that compose web servers. To design an effective load test suite for a SOA application, you need to take into account many important considerations, such as closely mimicking real-world interactions with the web servers, simulating realistic usage scenarios, accounting for user “think time”, and modeling geographic distribution of users. You also have to plan for various usage patterns to simulate situations such as “peak usage” scenarios, where a server undergoes a steady load and then peaks at a certain hour of the day. You have to include into your test plan tests that run for prolonged periods of time and check whether system performance degrades with time. To realize an effective test plan, you should gather Quality Of Service data. These reports are used to insure that performance targets are met (for example, check whether a specific percentage of all hits don’t execute under a specific amount of time). Finally, the potentially overwhelming amount of data gathered over the course of a load test (such as CPU usage, database response time, maximum execution time) has to be systematically organized into reports, graphs and charts. An effective testing strategy and the corresponding test plan will help you to identify bottlenecks in a web services server and/or its supporting infrastructure. Resolving these will help improve and establish a consistent level of service for your end-users.
Load testing for the diverse environment. Software Test and Performance Journal, October 2005, pp. 22-30.
Miller, R. B. (1968). Response time in man-computer conversational transactions. Proc. AFIPS Fall Joint Computer ConferenceVol. 33, 267-277.
A9 Denial of Service. Retrieved May 11, 2006 from http://www.owasp.org/documentation/topten/a9.html