Monday, July 6, 2009

Effective Clustering

The term *Clustering*, in the context of computing, is a group of one or more servers (or nodes) that are connected via a high-speed interconnect, with the objective of offering an illusion of a single computing platform. Let’s look at the benefits of creating an effective computing cluster, and the considerations that surround it.

Some of the benefits of a computing cluster are:

1. Better Scalability: Clustering is a way to augment the horizontal scalability of the service provided by the computing platform. Vertical scalability may be augmented by hardware enhancements (better processor, more memory), and/or by using good design practices and code refactoring to remove performance bottlenecks in the service offered. Efforts to achieve vertical scalability work up to a point, and after that the only available option is horizontal scalability, which is achieved by adding more nodes and forming a computing cluster. However, if a single node supports N users, having five such nodes doesn't mean automatic support for 5XN users. Linear scalability depends on certain other considerations, such as load balancing and continuous monitoring, which have been explained later.

2. High-Availability / Failover: Another important benefit of clustering is redundancy of data and service. In the event of a failure of one or more nodes in the cluster, it is expected that the other nodes continue to offer access to the service and associated data, perhaps not at the same level of performance. Data availability for certain clusters that don’t have a shared database is not trivial, as it generally involves some form of data replication to synchronize the data on all the nodes in the cluster.

3. Improved Performance: Certain clusters are set up to perform lengthy computation tasks in parallel. Having more than one node concurrently work on a task may significantly improve the performance of the overall computation, provided that the gains surpass the overhead involved in task allocation and collating the results.

Now that we have reviewed some of the benefits of clustering, let's review some consideration for setting up effective clusters.

1. Cluster Type: The very first consideration to address is the type of cluster that we desire to set up - should it be an active-passive cluster or an active-active cluster? An active-passive cluster has only a single processing node and one or more standby nodes, one of which is designated as a fail-over for the primary (sometimes referred to as the hot standby). An active-passive cluster is generally targeted towards high-availability and failover, and it offers limited or no scalability. An active-active cluster is basically a cluster of peers and it offers true scalability, as well as high-availability and failover. An active-active cluster generally requires the synchronization of shared resources (data, session) across all the nodes in the cluster.

2. Load Balancing: A good load balancing mechanism of one of the most important tenets of an effective cluster. It is imperative to equally distribute the processing load on all the nodes in the cluster. While using an external load balancer, such as a commercial one from F5 or the free Apache mod_proxy_balancer, is a viable option, it adds up to the cost of the deployment. Good load balancers (F5) easily cost a couple of thousand dollars and even the free ones (mod_proxy_balancer) require an additional dedicated machine. Some low-end load balancers don’t do anything more than a round-robin on the client requests and they perform a shallow ping on the node, only checking for the availability of the node, not of the offered service. While external load balancers are a viable option in some cases (for thin browser-based clients), for other proprietary client cases, it is better to build the load balancing in the client. Implementing a load balancing algorithm in the proprietary client gives the opportunity to determine the real-time load on the cluster, via a connection set up (or handshake) phase.

3. Handling Shared Resources: Certain resources may need to be shared across the cluster; however, they may be node-specific by nature – such as data kept in local databases, user session data, and distributed task processing data. Synchronization of this data across the cluster involves certain data replication and distributed locking mechanisms. For a cluster to perform at optimal levels, the data replication and distributed locking algorithms need to be well designed.

4. Continuous Monitoring: A given node in the cluster should be aware of the state of the other nodes in the cluster. This may be achieved using a heartbeat mechanism, where periodic ping messages are exchanged between the various nodes in the cluster. In the event of a node being down, other members of the cluster may attempt to restart it and even temporarily redistribute the load of the failed node among them.

By meticulously handling these clustering considerations, we can hope to receive some of the tangible benefits of an effective clustering solution.