Performance

Summary

Performance Overview
Designing for Performance
Capacity Planning - Overview
Testing for Performance
Performance Tuning
Best Practices for Performance

Performance Overview

From a user perspective, application response time defines performance, and performance is defined through key application metrics such as transaction throughput and resource utilization. Metrics related to hardware such as network throughput and disk access are common performance bottlenecks.

Note that performance and scalability are not interchangeable terms. They are two distinct issues- while performance relates to how fast the application responds to user requests, scalability relates to the capability of the system to add additional resources to compensate for increased work load.

Disregarding performance will result in an application that performs poorly. Responsibility for performance occur both at design time and at run time:

Design-Time
Developers should not introduce code that hinders application performance. Developers should follows accepted programming practices. Third party tools can like DevPartner can profile code and identify bottleneck spots.
Run-Time
The application should undergo mandatory performance testing to identify any performance bottlenecks, such as contention for resources or slow-running code. Of course, this should be done after all functional testing has been completed. In other words, an application must work before it can work well. Nonetheless, performance testing should begin as early as possible to identify problem areas as soon as they are introduced into the application.

Designing for Performance

Performance requirements should be defined before development begins. In general, to properly define performance requirements, you will need to:

Identify Load
Determine Features
Identify Constraints

Based on this you can select appropriate metrics and determine specific performance goals.

Identify Load

The following suggestions help identify the expected application load:

A common way to determine the load on the application is to estimate the number of users.
A related measure is think-time which is the elapsed time between the receipt of a reply to a request and the submission of the next request. For example, if it takes on average 45 seconds for a new user to register using a Web-based registration form, then the think-time is 45 seconds.
Another way to estimate load to evaluate load variance over time. For some applications, the load remain constant, while for others the load varies over time. For example, if a Web-based payment processing application sets the deadline for receiving payments at the end of each month, then it is natural to expect a heavy load towards the end of each month. Note that information about how the load varies over time can be used to determine peak and average system loads. You then base performance requirements on peak and average system loads.

Determine Features

Application features typically correspond to use cases and usage scenarios. Here you need to precisely define the semantics of each feature (use case) that is performance-sensitive. You need to fully examine how the feature processes the use-case including verifications, business processing, database access, caching, and so on. It is these definitions that drive the tests that measure performance.

Also note that accurate estimates of the usage of various application services can help create tests that mimic real-life usage of the system.

Identify Constraints

Changing some aspects of a project to improve performance may not be an option. For example, if an application has to be delivered by a specified date, then re-designing it for performance may not be an option. Hardware constraints may also be a factor, especially for user workstations. All these constraints must be documented because they are constant during performance tuning.

Aspects of the projects that are not constrained may be changed during performance tuning. For example, examine if transactions (when used) are really needed. Examine if new servers can be added to the application topology, and so on. These issues can help remove bottlenecks in the system.

Capacity Planning - Overview

Capacity planning is determining the most efficient way of increasing a system's performance and scalability, while at the same time predicting the point at which a resource will cause a bottleneck in the system. The starting point for capacity planning is determining the application's capacity, which you can determine by:

Number of users it can handle before performance falls off.
Server's ability to handle increased load - either due to increased number of users or increased request complexity.
Complexity of the application.

Capacity is indirectly influenced by performance. A well-tuned application can increase capacity by efficiently using available resources and releasing resources that are not used by an active process. At some point, the application can handle more request without degrading performance. This is the point at which you either scale up by upgrading or replacing existing servers, or scale out by adding more servers..

Ideally, you should do some capacity planning that established acceptable performance benchmarks and resource usage limits. You should also develop a plan to scale your system as soon as performance degrades.#

Testing for Performance

Performance testing assumes that the application is functioning, stable, and robust. The application must pass its functional tests before you can test for performance, otherwise, bugs in the code can potentially mask performance problems, or even worse, give the impression that there is a performance problem.

The following guidelines should be used when testing for performance:

Measuring Performance
Defining Performance Tests
Determining Baseline Performance
Stress Testing
Solving Performance Problems

Measuring Performance

To tune performance, you must be able to maintain records of each performance test pass. These records should include:

Exact system configuration, including changes from pervious test.
Raw data.
Calculated performance results.

Try to automate as much as possible of the performance tests to eliminate possible operator differences. Also during each test pass, run the same exact set of performance tests - otherwise, it will not be possible to distinguish whether different performance results are due to code changes or to test changes.

Defining Performance Tests

Testing should be as realistic as possible. For example, test the application to determine how it performs when many clients are accessing it simultaneously ( a multi-threaded test-harness can simulate multiple clients in a reproducible manner.) If the application accesses a database, the database should contain a realistic number of records. If the database is too small, performance results will be inaccurate.

Make sure you document how to set up a database for running a performance test. The instructions should specify that the database should not include changes make by a previous test pass.

Determining Baseline Performance

After defining performance goals and developing performance test, run the tests one to establish a baseline. Baseline results along with documenting the initial test environment will provide a solid foundation for the tuning effort.

Stress Testing

Stress testing is a specialized form of performance testing. The goal of stress testing is to crash the application by increasing the processing load until the application begins to fail due to saturation of resources or occurrence of errors. Stress testing often reveals subtle bugs that go unnoticed until the application is deployed. While some of these bugs may be logical (i.e., array limit was exceeded ), most often they are the result of design flaws. Therefore, stress testing should begin early in the development phase of each part of the application. These kinds of bugs should be fixed at their source rather than fixing bugs that manifest themselves as a result of the source bug.

Solving Performance Problems

Finding a solution to poor performance is often like conducting a scientific experiment. You can most often solve performance problems by following the same process for conducting a scientific experimentation. This process contains six steps:

Observation.
Preliminary hypothesis.
Prediction.
Controls.
Tests.
Theory.

The output of the experiment is a theory which consists of a hypothesis supported by a collection of evidence accumulated by the process.

For example, you observe a poor performance in a distributed application which uses thread pooling for server-side objects. Using the performance monitor - PerMon - you (1) observe that the number of threads per CPU never exceeds 10 threads . You (2) hypothesize that the maximum number of threads per thread pool is set to a low value and needs to be increased. You (3) predict that increasing the ThreadCountPerThreadPool property will improve performance. The ThreadCountPerThreadPool property has now become the (4) control and you start (5) testing various values of this property and see how it affects performance. If performance that is more satisfactory is achieved after several adjustments to this property, you establish a (6) theory that certain property settings can provide enhanced performance in combination with all current variables.

Performance Tuning

Performance tuning is the main activity associated with performance management. Reduced to its most basic level, performance tuning is about finding and eliminating performance bottlenecks. Bottlenecks usually appear when a piece of hardware or software approaches the limit of its capacity.

Tuning the performance of an application uses a tuning cycle shown below:

However, before starting the performance tuning cycle, you need to establish the framework for ongoing performance tuning activities. For example:

Identify Constraints
Constraints such as manageability and budget limits are unalterable factors in search of higher performance. Focus performance work on factors that are not constrained.
Specify Work Load
The most common metric for specifying the load is the number of users, user think-time and load distribution.
Set Performance Goals
Performance goals must be explicit. Total system throughput and response time are two common metrics used to measure performance.

After establishing boundaries and expectations for performance tuning, you can begin the tuning cycle. As shown by the figure above, the tuning cycle is an iterative series of four (4) controlled performance experiments. These four steps are:

Collecting
Analyzing
Configuring
Testing

Collecting

This is the starting point of any tuning exercise. During this phase you simply collect performance data with the collection of performance counters that you have chosen for a specific part of the system. These counters (often from PerfMon) could be the CPU, threads, network I/O, back-end database connections, and so on.

Regardless of what part you are tuning, you require a baseline measurement against which you should compare performance changes. You can use your first data-gathering pass to establish a baseline set of values for the system's behavior.

Analyzing

After collecting performance data, you start analyzing it to determine performance bottlenecks. Keep in mind that a performance counter is only an indicator - it does not necessarily identify the bottleneck because you can trace a performance bottleneck back to multiple sources. One the best examples is a CPU performance counter which can be directly affected by low disk space. It is also common for problems in one system component to result from problems in another system component.

The following points provide guidelines for interpreting counter values and eliminating false or misleading data:

Monitoring processes with the same name
Track processes using process ID rather than name. The system monitor may represent data for separate instances having the same name by reporting the combined values of these instances as the value of a single instance.
Monitoring several threads
Trace threads by including identifiers of the process's thread. When monitoring several threads and one of them stops, the data for one thread might appear to be reported by another thread. This is because of the way threads are numbered.
Intermittent data spikes
Do not give them too much weight. Counters that average can cause the effects of spikes to linger in the reported average value.
Monitoring over an extended period
Use graphs rather than reports or histograms.
Excluding start-up events
Unless you have a reason for including start-up events, exclude them because the temporarily high values they produce tend to skew overall results.
Zero values or missing data
Investigate all occurrences of zero values or missing data, as this has negative effects on establishing a meaningful baseline.

Configuring

After collecting and analyzing data, you can determine which part of the system is a candidate for a configuration change, and then implement this change. When tuning performance, the cardinal rule is to implement one configuration change at a time before repeating the tuning cycle again.

Testing

After completing a single configuration change, determine the impact of this single change on the system performance. At this point, you need to determine if this single change improved performance, degraded performance, or had no effect on performance. If performance has been improved, you can quite, otherwise you must step through the tuning cycle again.

Best Practices for Performance

The following practices are recommended for creating high-performance applications:

Tune your software and hardware
Monitor application performance, especially throughput and latency and tune software parameters and hardware configuration to eliminate bottlenecks.
Reuse work by caching
One of the best way to improve performance is not to do the same work again and again. For example, static data (i.e., data that does not change) should not be fetched repeatedly from the database. It should be cached so that it is available immediately.
Warn the user
Warn the user ahead of any potentially long-running operations. Long-running operations should be performed asynchronously.
Tune the database
Using a database can introduce bottlenecks when reading/writing data. There are numerous steps to optimize data access:
- Identify potential indexes and use them.
- If using SQL Server, use its Profile and Index Tuning Wizard.
- If using SQL Server, analyze query plans using Query Analyzer.
- Monitor processor usage.
- Use stored procedures to maximize performance.
- Write less data by normalize what you write a lot.
- Read less data by de-normalizing what you read a lot.
Partition large data tables
When accessing large data tables, increase processing speed by horizontally and vertically partitioning them. Horizontal partitioning divides a table containing a large number of rows into multiple tables containing the same number of columns, but each contains a subset of the data. For example, a [Customers] table can be horizontally portioned to two tables - table 1 contains all customers with last names beginning with letters A to M, while table 2 contains all customers with last names beginning with letters N to Z, Vertical partioning segments a table containing a large set of columns into multiple tables containing the same rows but with each table containing a subset of these columns..
Stress test your application
You cannot fully determine where bottlenecks exist in your application unless you test it under load. The Application Stress Tool can be helpful in simulating stress for your application.
Use transactions wisely
Transactions should be short-lived and only encapsulate what is required. Distributed transactions involve significant overhead that can adversely impact application performance. Distributed transactions should only be used when absolutely necessary.
Reduce network communications
Cross-boundary communications across application or process boundary affects performance. Therefore, try to reduce network roundtrips when calling remote methods/procedures. For example, if you have a remote object, rather than setting five different properties on it and then calling a remote method that uses these parameters (six network roundtrips), modify the method to accept five arguments and then call the remote method (one network roundtrip). This design approach indirectly encourages the use of the stateless business object layer.
Use security wisely
Limit the use of security only to parts of the application that truly needed. For example, accessing web pages using SSL/TLS incurs significant overhead since all communication is encrypted between the server and the client.
Use .NET Remoting / Web services wisely
In some environments (mostly .NET to .NET), accessing remote object using TCP (i.e., .NET Remoting) is better than using SOAP over HTTP (i.e., Web Services). See .NET Remoting or Web Services?
Review best practices for scalability
Performance and scalability are closely related. In addition to best practices listed here, review Best Practices for Scalability.