The Great Rat Race: 2009

Monday, August 17, 2009

Comparing Java Runtime Analysis (or Profiling) Tools

Runtime analysis is a practice aimed at understanding software component behavior by using data collected during the execution of the component. The analysis provides an understanding of the following aspects of the application execution environment:

Execution paths and code coverage.
Memory utilization and memory leaks.
Execution performance and performance bottlenecks.
Thread analysis and related concurrency issues.

Enterprise Java applications that are designed to run on modern multi-core processors typically benefit from the use of a Java runtime analysis tool, as it provides information on memory leaks, performance bottlenecks and even concurrency issues such as deadlocks.

Recently at work, I got the opportunity to evaluate the leading Java profiling tools. An initial review of the leading tools in this arena resulted in the following list - JProfiler, YourKit Java Profiler, Java Visual VM, DevPartner Java Edition, Rational Purify, JProbe and OptimizeIt. A preliminary investigation shortlisted the candidates to JProfiler, YourKit Java Profiler and Java Visual VM. Both JProfiler and YourKit Profiler are leading award-winning tools and I was basically looking to compare them with the Java Visual VM, a free tool available with JDK 6.0 Update 7 (Windows). The other tools were rejected for various reasons - the DevPartner for Java product kept crashing, Rational didn't have a standalone Java edition, OptimizeIt only worked well with JBuilder, and finally, based on reviews posted on the web, JProbe was considered somewhat inferior to JProfiler and YourKit Profiler.

The three selected candidates (JProfiler 5.2.2, YourKit Java Profiler 8.0.13 and Java Visual VM 1.6.0_14 were compared against the following evaluation criteria:

License Cost: While Visual VM is free, both JProfiler and YourKit Profiler are commercial tools that provide an option to purchase both standalone (node-locked) licenses and floating licenses. The license cost for both these products is more-or-less in the same range.
Ease-of-use: This was an important consideration as the profiling tool should be intuitive to use, with the results being presented in an easily understandable format. All the tools faired equally in this category.
Performance (CPU) Profiling: CPU profiling helps to identify hotspots (or method) that result in higher CPU usage. All the three tools provide comprehensive analysis, however, YourKit and JProfiler have better presentation options that display the data using call graphs and trees.
Memory Utilization: This form of analysis presents information regarding the memory usage. The three tools under consideration provide a view which lists all the objects and their associated memory consumption. Again, the presentation of JProfiler and YourKit is slightly better than Visual VM.
Thread Analysis: Provides a view of the threads running in the VM. All the tools under consideration provide very good thread analysis capabilties and also detect concurrency issues such as deadlocks.
Code Coverage: This criteria was also under consideration; however, it was deemed less important in comparison with the other criteria.None of the tools being evaluated provided code coverage analysis.
Remote Profiling: This is the ability to perform the runtime analysis from a remote machine. JProfiler and YourKit provide all the features that are available for local analysis; however, Visual VM only provides a limited set of features for remote analysis.
IDE Integration: All three tools integrate well with Eclipse and other major Java IDEs.
Supported Platforms: The three tools selected support all the major version of the common operating systems - Windows, Linux, Mac OS X and Solaris.

Based on the comparative analysis, it was clear that both JProfiler and YourKit were slightly superior products with some great profiling features. While Visual VM may not have all the features provided by the other two products, it is extremely intuitive and provides all the basic features that are desired from a runtime analysis tool. It is also important to note that since Visual VM is extensible via a plug-in architecture (just like Eclipse), it is poised for growth and contributions from the open-source community will eventually make it a compelling product, possibly at par (or even better) that JProfiler and YourKit.

Therefore, we decided to use Java Visual VM, as it satisfies our current needs and there is already some contribution from the development community in the form of some really useful plug-ins.

Monday, August 3, 2009

Comparison of Java source code analysis tools

Recently at work, I got the opportunity to evaluate some leading static source code analysis tools for Java. The intent was that a good source code analysis tool would help to improve the quality of the product by detecting problems in the code, well before the QA folks get it.

At expected, there are a plethora of tools in the market - some that are commercially available, whereas others that are provided free with the open-source initiative. I selected Coverity Prevent for Java, one of the leading commercially available tools, in part due to the existing relationship that my present employer has with Coverity. I also decided to pick two leading tools, FindBugs and PMD, from the open-source arena, as they seem to be the most popular tools in use. The selected tools were used to analyze the same Java codebase on a Windows 2008 Server machine with 1.5 GB allocated to the analysis.

Comparing source code analysis tools from different vendors is not a typical apple-to-apple comparison. Each tool has its own inherent strengths and does better than the other in certain areas. Anyhow, following an evaluation methodology that has always worked very well for me, I evaluated these three candidates again the following criteria:

License Cost: An important consideration in any evaluation. Being a commercial product, Coverity Prevent has as associated license cost, whereas the other two are basically free.
Quality of the Analysis: Obviously concurrency and resource ultilization issues are deemed more important that ununsed method / variables. This was a difficult comparison to make, as all the three tools reported a wide spectrum of problems. In general, however, I found the Coverity Prevent and FindBugs analysis to be better than PMD.
Speed of the Analysis: Since the objective is to integrate the analysis with the nightly build, a short analysis time is preferred. While Coverity Prevent took hours to analyze the codebase, both FindBugs and PMD were done in minutes.
Eclipse Integration: Having an Eclipse plugin is essential to report defects during day-to-day development. Fortunately, the three tools selected provide one.
Rule Customization / Extension: The ability to customize the existing rulesets and add new ones was considered a desirable feature. While all three provided the option to add / drop certain rulesets from the anaysis, only FindBugs and PMD allowed the user to create new customized rulesets.
Defect Reporting: This considered the ability of the tools to report the defects in the most intuitive and convenient manner. Coverity Prevent has a great web-based defect manager that allows the user to remotely look at the defects, review the associated source code and act on them. FindBugs has a GUI option that displays the defects and provides associated source markups, however, PMD doesn't provide the source markups and it is only available as a command-line program. All three tools provided an option to export the defects in different output formats (html, xml).

Based on the evaluation criteria given above, we selected FindBugs, which is in line with the wisdom of the web - "If you have never used a source code analysis tool before, try a free one first and see if that works for you". Typically, projects use multiple open-source code analysis tools and I found a lot of references to projects that have used both FindBugs and PMD.

We are now looking for a Java runtime analysis tool and I am doing an evaluation for that. I hope to post the results of that as well.

Monday, July 6, 2009

Effective Clustering

The term *Clustering*, in the context of computing, is a group of one or more servers (or nodes) that are connected via a high-speed interconnect, with the objective of offering an illusion of a single computing platform. Let’s look at the benefits of creating an effective computing cluster, and the considerations that surround it.

Some of the benefits of a computing cluster are:

1. Better Scalability: Clustering is a way to augment the horizontal scalability of the service provided by the computing platform. Vertical scalability may be augmented by hardware enhancements (better processor, more memory), and/or by using good design practices and code refactoring to remove performance bottlenecks in the service offered. Efforts to achieve vertical scalability work up to a point, and after that the only available option is horizontal scalability, which is achieved by adding more nodes and forming a computing cluster. However, if a single node supports N users, having five such nodes doesn't mean automatic support for 5XN users. Linear scalability depends on certain other considerations, such as load balancing and continuous monitoring, which have been explained later.

2. High-Availability / Failover: Another important benefit of clustering is redundancy of data and service. In the event of a failure of one or more nodes in the cluster, it is expected that the other nodes continue to offer access to the service and associated data, perhaps not at the same level of performance. Data availability for certain clusters that don’t have a shared database is not trivial, as it generally involves some form of data replication to synchronize the data on all the nodes in the cluster.

3. Improved Performance: Certain clusters are set up to perform lengthy computation tasks in parallel. Having more than one node concurrently work on a task may significantly improve the performance of the overall computation, provided that the gains surpass the overhead involved in task allocation and collating the results.

Now that we have reviewed some of the benefits of clustering, let's review some consideration for setting up effective clusters.

1. Cluster Type: The very first consideration to address is the type of cluster that we desire to set up - should it be an active-passive cluster or an active-active cluster? An active-passive cluster has only a single processing node and one or more standby nodes, one of which is designated as a fail-over for the primary (sometimes referred to as the hot standby). An active-passive cluster is generally targeted towards high-availability and failover, and it offers limited or no scalability. An active-active cluster is basically a cluster of peers and it offers true scalability, as well as high-availability and failover. An active-active cluster generally requires the synchronization of shared resources (data, session) across all the nodes in the cluster.

2. Load Balancing: A good load balancing mechanism of one of the most important tenets of an effective cluster. It is imperative to equally distribute the processing load on all the nodes in the cluster. While using an external load balancer, such as a commercial one from F5 or the free Apache mod_proxy_balancer, is a viable option, it adds up to the cost of the deployment. Good load balancers (F5) easily cost a couple of thousand dollars and even the free ones (mod_proxy_balancer) require an additional dedicated machine. Some low-end load balancers don’t do anything more than a round-robin on the client requests and they perform a shallow ping on the node, only checking for the availability of the node, not of the offered service. While external load balancers are a viable option in some cases (for thin browser-based clients), for other proprietary client cases, it is better to build the load balancing in the client. Implementing a load balancing algorithm in the proprietary client gives the opportunity to determine the real-time load on the cluster, via a connection set up (or handshake) phase.

3. Handling Shared Resources: Certain resources may need to be shared across the cluster; however, they may be node-specific by nature – such as data kept in local databases, user session data, and distributed task processing data. Synchronization of this data across the cluster involves certain data replication and distributed locking mechanisms. For a cluster to perform at optimal levels, the data replication and distributed locking algorithms need to be well designed.

4. Continuous Monitoring: A given node in the cluster should be aware of the state of the other nodes in the cluster. This may be achieved using a heartbeat mechanism, where periodic ping messages are exchanged between the various nodes in the cluster. In the event of a node being down, other members of the cluster may attempt to restart it and even temporarily redistribute the load of the failed node among them.

By meticulously handling these clustering considerations, we can hope to receive some of the tangible benefits of an effective clustering solution.

Tuesday, June 30, 2009

What is there in the name?

Choosing a name for my very first blog was not an easy task. After googling around quite a bit and looking at a lot of other blog names, I finally decided on this one. As the very first post to my blog, I would like to explain the significance of this name.

Even though I have been working in the IT industry for a while now, the rapid advancement in the industry never seizes to amaze me. I find myself constantly learning new stuff and keeping my skills up-to-date with the current technology trends, perhaps just like many other IT professionals. This race to always be on the cutting-edge of technology has most IT professionals on their toes, which is not necessarily a bad thing though. The wonderful thing about this race is that there is no way to find out who is ahead and even to say whether there is ever going to be a finish line. I guess there is no point even trying to find out how well you are doing in this race, as the most important thing is to always continue learning and evolving.

This blog attempts to capture my efforts in keeping up with this rat race. I hope to blog about the new things that I learn and find worth sharing. I hope you enjoy reading it, as much as I enjoy writing it.

Happy Reading!

The Great Rat Race