Scalability Overview

The key characteristic of a scalable application is that additional load only requires additional resources rather than extensive modification of the application itself. Therefore, scalability is the ability to add resources to yield a linear increase in service capacity.

Scalability must be part of the design process because it is not a feature that can be added later or during deployment. The decisions that you take during the design phase and early coding phases will largely dictate the scalability of your application. However, note that salability is not a design concern of stand-alone application, but rather it concerns distributed applications. Distributed applications are a step beyond client-server applications because distributed applications are designed as n-tier applications. Such distributed architecture promotes the design of scalable application by sharing resources such as business and database components.

Scalability requires a balancing act between software and hardware. For example, building a network-load-balanced cluster of application servers will not benefit a client application if server-side code was written to run on a single server (see location affinity rather then location transparency.) Likewise, writing a highly scalable application and then connecting it to a low bandwidth network will not handle heavy loads when traffic saturates the network.

Scaling Up

Scaling up means achieving scalability by using bigger and faster hardware. Bigger and faster hardware means adding more memory, adding more processors, replacing old processors with faster one. Scaling up typically allows for an application to increases its capacity without having to change code. Administratively, things also remain the same as there is still one machine to administer. Scaling up is summarized below: 

Note that scaling up does no increase capacity linearly. Instead, the performance gain curve slowly reaches its limit as more resources are added. For example, adding four processors to server does it not increase capacity to 400% over the unprocessed version. Synchronization between these processors as well as contention over a single memory-bus will result in a lower performance gain. Once you have upgraded each hardware component to its maximum limit, you will eventually reach the real limit of the machine's processing capacity. At that point, the next step in scaling up is to move to a bigger and faster machine.

Scaling up also presents another potential problem. Using a single machine to support an application creates a single point of failure, and this greatly diminishes the system-availability.

Scaling Out

Scaling out is an alternative to scaling up. Scaling out is about distributing the processing load over multiple machines. Although scaling out is achieved using multiple machines, these machines will essentially share load and function as a single server. Scaling out is summarized below

Also note that by dedicating several machines to a common task, application fault tolerance is increased. However, from an administrator's point of view, scaling out presents greater management challenges due to the increased number of servers.

Obviously, configuring multiple servers to share load from clients requires special software. With Windows, you can use a variety of techniques including clustering and load-balancing. Load-balancing allows a collection of servers to scale out across a cluster of servers. Load-balancing also provides redundancy allowing the collection of servers to remain available to users even if one or more servers fail.

The key to successful scaling out is location transparency (as opposed to location affinity) - if any of the application code depends on knowing what server is running the code, location transparency has not been achieved and scaling out will be difficult. This situation is called location affinity. If you design an application with location transparency in mind, then scaling out becomes an easier task.

Designing for Scalability

Primary Design Goal: efficient resource management. Contention for resources is the root cause of all scalability problems.

It is design phase decisions that have the greatest impact on scalability. As the figure below shows, good design is literally the foundation for a highly-scalable application:

As the scalability pyramid indicates, hardware, software and code tuning are only a small part of the scalability solution. Design, which is at the base of the pyramid, has the greatest influence on scalability. The ability to impact scalability is decreased as you move up the scalability pyramid. Smart design can add much more to an application's scalability than hardware.

Because the primary design goal for scalability is efficient resource management, design is not limited to a specific component or tier of an application. As a designer, you must consider scalability at all levels - from the data store to the user interface. The following recommendations can be helpful when designing for scalability.

Testing for Scalability

Scalability testing is an extension of performance testing. Scalability testing is about identifying major workloads and resolving bottlenecks that can impede the application's scalability. As an application is scaled up and/or out, comparison of performance test results against  a baseline performance test will indicate whether the application scales or not.

Best Practices for Scalability

The following best practices are recommended for creating scalable application.