The Great Rat Race

Wednesday, September 29, 2010

REST vs. SOAP

There is a lot of information on the web pertaining to REST; however, there is nothing relevant that compares REST to SOAP. This post contains a brief introduction to REST, and provides a REST vs. SOAP comparison. The reader is expected to have some familiarity with SOAP.

REST(Representational State Transfer) is an architectural style for networked applications, which is based on the Ph.D. dissertation of Roy Fieldings. REST introduces a different paradigm for web services, which are traditionally thought of as a RPC-based services, using a SOAP+WSDL combination. Web services written using the REST style adhere to the Resource Oriented Architecture (ROA) paradigm, a term given to a set of rules for designing such services. Typically, a user of a web application progresses through a series of pages or URLs, resulting in the state being transferred from one traversed resource to the next. REST attempts to formalize this model using four important concepts - resources, their names, their representations and the links between the resources. All RESTful services are judged by four important properties – addressability, statelessness, connectedness and the uniform interface.

REST architectural rules are also called “constraints”. Unconstrained architecture allows method calls, RPC and other messages that are understood by a specific component or module (client or server) involved in the interaction. REST eliminates ad-hoc messages and radically shifts the focus of API development towards defining pieces of information that can be retrieved and manipulated. The motivation for REST was to create an architectural model for how the web should work, such that it would serve as the guiding framework for the web protocol standards. REST prescribes the use of standards such as HTTP, URI and XML.

REST objects are called “resources”, with the information in resources being called “state”. This information has to be encoded to include it in a message, this encoding are called “representation”. Method invocations transfer state in representations. The following is a list of the HTTP methods and their implied meaning in REST:

· GET to an identifier means, give me your information.

· PUT to an identifier means, replace your information with the new one provided.

· POST adds new information.

· DELETE removes the information.

Resources are identified by URIs and manipulated through their representation. HTTP is a compliant RESTful protocol; however, it is possible to apply REST concepts to other protocols and systems. The statelessness property of REST ensures that any resource can be served by any server, thereby making REST solutions highly scalable. REST services may be described using WSDL or WRDL (Web Resource Description Language). The following are the characteristics of a REST-based system:

· Client-Server: A pull-based interaction style.

· Stateless: Request from client to server must contain all the information necessary.

· Cache: To improve network efficiency, responses must be capable of being labeled as cacheable or non-cacheable.

· Uniform interface: All resources are accessed via the generic HTTP methods.

· Named Resources: Every resource in a RESTful service is appropriately named.

· Interconnected resource representation: Enables a client to progress from one state to another.

A logical question is: how is REST different from SOAP? SOAP offers a RPC-oriented paradigm, where the participating components are interacting in a closed environment, using a proprietary API. REST offers a solution based on commonly used web standards and offers a more open solution, where even unknown clients can connect to a server component and use its capabilities using standard HTTP requests / responses. In addition to this basic difference in the two approaches, the following are some additional differences between these two paradigms.

· Security: A proxy server can look at the REST request and determine the resource being requested, based on which the request may be allowed or denied. Whereas for the SOAP message, the resource is identified inside the envelope, which is not accessible, unless the SOAP message is written using RDF (Resource Description Framework) or DAML (DARPA Agent Markup Language). Therefore, for a SOAP-based web service, security is generally built into the proprietary API.

· State Transitions: Each resource representation received by the client causes it to transition to the next state. The decision about which link to navigate is either hard-coded in the client or determined dynamically using XLINK (xlink:role). In a SOAP network, state transitions are always hard-coded in the client.

· Caching: Network communication has always been a bottleneck, and therefore the HTTP headers can contain a request to cache data. SOAP is always a HTTP POST and since the SOAP URI is directed to the server and not the resource, no caching is possible with SOAP. However, since REST uses the generic HTTP interface, it is possible for intermediate proxies to cache the results from a RESTful service call, in an effort to achieve a better performance.

· Evolving the Web (Semantic Web): It is envisioned that eventually the web will be accessed by people and computers alike, each being capable of intelligently processing the data returned by services on the web. In this vision of the Semantic Web, every resource has a unique URI and is accessible using standard HTTP methods. SOAP is not consistent with the Semantic Web vision, whereas REST is completely aligned with it.

· Generic Interface: Using REST, access to every resource is made using HTTP GET, POST, PUT and DELETE. With SOAP, the application needs to define its own proprietary methods.

· Interoperability: With interoperability, the key is standardization. Web has standardized on certain things, such as URI for address and naming, HTTP for generic resource interface and HTML/XML/GIF/JPEG for resource representation. REST uses these standards, whereas SOAP depends on customizations. SOAP clumping of resources behind a single URI is contrary to the vision for the web. SOAP is best utilized for closed systems, where all participants are known beforehand.

Wednesday, September 8, 2010

FindBugs Warning - Exception is caught when exception is not thrown

Performing static analysis of a Java code-base on a regular basis is an extremely useful exercise, and I have found FindBugs to be an extremely useful and worthy tool. One particular warning raised by FindBugs - exception is caught, when exception is not thrown - may appear to be a false positive at first, however, the tool basically recommends catching specific exception types, instead of having a "catch all" exception clause that catches the base Exception class.

The reason for this is pretty simple - catching the base Exception class will also catch the RuntimeException, which is a child class of Exception. This will mask potential programming mistakes. As a result of having a catch clause with the base Exception class, I have seen instances of NullPointerException - a child of RuntimeException - being caught and logged on numerous occasions. This potentially masks problems in the code, when an object instance was null, although it wasn't supposed to be so. If the object is null in only certain circumstances, then there is a distinct possibility that catching the base Exception will cause this problem to slip by in the development environment and fail in a production set-up at a customer site.

Catching specific exceptions and handling them appropriately - and perhaps differently - also makes for a better error-handling approach. Overall, it improves the readability of the code, where others are able to better understand and extend the exception handling mechanism.

Not long ago there was a trend among Java programmers to use the "*" notation while importing packages - e.g. java.util.*, instead of explicitly importing the classes required. This trend seems to have disappeared, and I hope that the trend of catching the base Exception class also cedes to the approach of explicitly catching the specific exceptions.

Monday, July 19, 2010

Optimizing Power Consumption

I recently reviewed an interesting paper, titled - Optimizing Power Consumption in Large Scale Storage Systems. Even though I was impressed by the lucid presentation of the problem and the quality of the proposed solution, the true significance of the work dawned upon me only after I watched Al Gore's movie - The Inconvenient Truth. Yes, three years after the movie won the Academy award for best documentary, I finally borrowed it from the library and watched it at home with my family.

Anyhow, the many objective of this post is to highlight some interesting aspects of the paper on optimizing power consumption. The paper highlights the reality of present time, where huge data centers have become a way of life. These data centers contain thousands of servers for storage, which in turn results in higher electric bills and searing heat. Hard disks account for a significant portion of the energy consumption and in a data center many hard disks are not accessed at a given time. The paper explains the three existing disk management solutions - Hardware-based solutions, Disk Management solutions and Caching solutions - that attempt to conserve power by powering down hard drives that are not being used. The paper outlines the limitations of these existing solutions, as not being able to predict well on which disks to power down, and then presents a fourth option - File-system solution, where the Log-structured File System (LFS) directs all writes to the log head. This leads to a perfect prediction mechanism as the disk being written to is known in advance and other disks may be powered down or operated in low-power mode.

LFS was initially motivated by the desire to optimize latency of write-accesses. To eliminate seek time, LFS replaces write operations by append, and the secondary storage is treated as a large append-only log, where writes go to the log head. Reads don't avoid the seek latency, however, the assumption is that with a good caching technique, there would be limited reads that need to access the secondary storage.

The paper finds a new fit for an old idea - using LFS to optimize the power consumption in a data center. Even though the idea sounds impressive at a conceptual level, there is still more work - related to the efficacy of log cleaning approach - that needs to be done before this idea turns into a viable solution. Overall this was an interesting read, with the significance of the work being exemplified by the wonderful movie - An Inconvenient Truth.

Wednesday, June 9, 2010

Catching Java Exceptions

While running FindBugs - a static analysis tool - on a Java project, I encountered numerous instances of a warning - "Exception is caught when Exception is not thrown". Digging deeper into the problem made me realize that this warning results from a "catch all" exception block - catch(Exception e) - that is very commonly used by most Java developers to avoid handling checked exceptions explicitly.

The reason that FindBugs complains about this practice is that having a catch all block for some code - using catch(Exception e) - also catches RuntimeException, which is a child class of Exception; however, doing so could potentially mask serious errors in the program logic. As an example, having a catch(Exception e) block catches the NPE (NullPointerException), which is a child class of RuntimeException. The NPE exception indicates potential problem with the code, indicating that a defensive null check is missing before an attempt to dereference an object. This problem may go undetected for a while if a "catch all" block is used to catch all exceptions.

The only solution to this problem is to explicitly catch the "checked" exceptions that can be thrown by the executing code. Even though there is a base class for runtime exceptions, there is no such class for checked exceptions, so the developer needs to explicitly catch the different checked exceptions, which may seem like a pain, but would certainly be beneficial in the long run. An additional motivation - for catching all checked exceptions explicitly - is that the method may need to throw each exception explicitly to clearly indicate the problem to the caller, which facilitates better error handling and reporting.

Monday, January 11, 2010

Throttling the SwingWorker using an ExecutorService

The SwingWorker is a utility class that ships with Java 6. It allows a Swing application to perform lengthy background computation in a separate worker thread, in effect freeing the event dispatch thread to interact with the user. Even though the SwingWorker utility is an important addition to the Java SDK, it does increase the resource overhead of the application by creating two additional threads for processing the lengthy computation - one thread performs the actual background work, while the other waits for the background thread to finish and then updates the results on the UI. Since the event dispatch thread is free to accept user input, the user - in the absence of a prompt response - may invoke the same functionality repeatedly. This results in a large number of worker threads being instantiated, and for a J2EE application, this in turn results in an increase in the number of associated threads being spawned by the servlet container to process the client requests. The increased number of threads on the server-side typically results in server overload and performance degradation. Even though the SwingWorker utility provides a cancel() method to stop the execution of an existing worker thread, there is no way to cancel the execution of the server-side thread created by the servlet container. The solution to this problem is to throttle the SwingWorker utility by using the ExecutorService, which has been added in Java 5 to execute Runnables using a thread pool. A fixed sized thread pool ExecutorService allows only a certain number of SwingWorker threads to be active at anytime, with the new threads having to wait for the earlier ones to finish, before getting a chance to execute. The value of the thread pool size is specific to the application and is primarily dependent on how many SwingWorker threads are expected to be active at any given time.

The code sample given below depicts a typical Swing application that uses the SwingWorker utility to retrieve data from the server. The SwingWorker utility is parameterized to have any desired return type , which is returned from the doInBackground() method. The type is used to denote the intermediate results that are used by the publish() and process() methods to depict - if required - the progress to the user. The doInBackgound() method is executed by the background worker thread that performs the lengthy computation. A second thread blocks at the get() call in the done() method and the event dispatch thread continues to perform user interaction. Finally, once the lengthy background computation is complete, the get() method returns the result of the doInBackground() method, which is then used by the second waiting thread to update the results on the Swing UI.

As explained above, once a SwingWorker thread is submitted for execution, it may be subsequently cancelled by invoking the cancel() method on the SwingWorker instance created. However, it is not possible to cancel the server-side thread that is spawned by the servlet container to process the client request. To avoid this problem, it is advisable to throttle the number of threads being created by using an ExecutorService with a fix thread pool of a certain size. Therefore, instead of calling the execute() method on the SwingWorker instance, the SwingWorker instance - which is a Runnable - is submitted to an implementation of the ExecutorService.

// Create a background worker thread

SwingWorker swingWorker =

new SwingWorker, Void>() {

// This method executes on the background worker thread

@Override

protected doInBackground() throws Exception {

compute result;

return result;

}

// This method executes on the UI thread

@Override

protected void done() {

result = get();

}

};

// Submit to the executor

SwingWorkerExecutor.getInstance().execute(swingWorker);

Given below is a very simple implementation of the SwingWorkerExecutor that creates an ExecutorService with a fixed thread pool size set as 3, which allows only three worker threads to be active at any given time. New Runnable instances of SwingWorker wait in the queue and are selected for execution only when a previous instance has completed execution. This strategy effectively avoids the spawning of numerous threads of the server, and therefore, prevents any possible performance degradation.

public class SwingWorkerExecutor {

private static final int MAX_WORKER_THREAD = 3;

private static final SwingWorkerExecutor executor = new SwingWorkerExecutor();

// Thread pool for worker thread execution

private ExecutorService workerThreadPool = Executors.newFixedThreadPool(MAX_WORKER_THREAD);

/**

* Private constructor required for the singleton pattern.

private SwingWorkerExecutor() {

}

/**

* Returns the singleton instance.

* @return SwingWorkerExecutor - Singleton.

public static SwingWorkerExecutor getInstance() {

return executor;

}

/**

* Adds the SwingWorker to the thread pool for execution.

* @param worker - The SwingWorker thread to execute.

public void execute(SwingWorker worker) {

workerThreadPool.submit(worker);

}

Monday, August 17, 2009

Comparing Java Runtime Analysis (or Profiling) Tools

Runtime analysis is a practice aimed at understanding software component behavior by using data collected during the execution of the component. The analysis provides an understanding of the following aspects of the application execution environment:

Execution paths and code coverage.
Memory utilization and memory leaks.
Execution performance and performance bottlenecks.
Thread analysis and related concurrency issues.

Enterprise Java applications that are designed to run on modern multi-core processors typically benefit from the use of a Java runtime analysis tool, as it provides information on memory leaks, performance bottlenecks and even concurrency issues such as deadlocks.

Recently at work, I got the opportunity to evaluate the leading Java profiling tools. An initial review of the leading tools in this arena resulted in the following list - JProfiler, YourKit Java Profiler, Java Visual VM, DevPartner Java Edition, Rational Purify, JProbe and OptimizeIt. A preliminary investigation shortlisted the candidates to JProfiler, YourKit Java Profiler and Java Visual VM. Both JProfiler and YourKit Profiler are leading award-winning tools and I was basically looking to compare them with the Java Visual VM, a free tool available with JDK 6.0 Update 7 (Windows). The other tools were rejected for various reasons - the DevPartner for Java product kept crashing, Rational didn't have a standalone Java edition, OptimizeIt only worked well with JBuilder, and finally, based on reviews posted on the web, JProbe was considered somewhat inferior to JProfiler and YourKit Profiler.

The three selected candidates (JProfiler 5.2.2, YourKit Java Profiler 8.0.13 and Java Visual VM 1.6.0_14 were compared against the following evaluation criteria:

License Cost: While Visual VM is free, both JProfiler and YourKit Profiler are commercial tools that provide an option to purchase both standalone (node-locked) licenses and floating licenses. The license cost for both these products is more-or-less in the same range.
Ease-of-use: This was an important consideration as the profiling tool should be intuitive to use, with the results being presented in an easily understandable format. All the tools faired equally in this category.
Performance (CPU) Profiling: CPU profiling helps to identify hotspots (or method) that result in higher CPU usage. All the three tools provide comprehensive analysis, however, YourKit and JProfiler have better presentation options that display the data using call graphs and trees.
Memory Utilization: This form of analysis presents information regarding the memory usage. The three tools under consideration provide a view which lists all the objects and their associated memory consumption. Again, the presentation of JProfiler and YourKit is slightly better than Visual VM.
Thread Analysis: Provides a view of the threads running in the VM. All the tools under consideration provide very good thread analysis capabilties and also detect concurrency issues such as deadlocks.
Code Coverage: This criteria was also under consideration; however, it was deemed less important in comparison with the other criteria.None of the tools being evaluated provided code coverage analysis.
Remote Profiling: This is the ability to perform the runtime analysis from a remote machine. JProfiler and YourKit provide all the features that are available for local analysis; however, Visual VM only provides a limited set of features for remote analysis.
IDE Integration: All three tools integrate well with Eclipse and other major Java IDEs.
Supported Platforms: The three tools selected support all the major version of the common operating systems - Windows, Linux, Mac OS X and Solaris.

Based on the comparative analysis, it was clear that both JProfiler and YourKit were slightly superior products with some great profiling features. While Visual VM may not have all the features provided by the other two products, it is extremely intuitive and provides all the basic features that are desired from a runtime analysis tool. It is also important to note that since Visual VM is extensible via a plug-in architecture (just like Eclipse), it is poised for growth and contributions from the open-source community will eventually make it a compelling product, possibly at par (or even better) that JProfiler and YourKit.

Therefore, we decided to use Java Visual VM, as it satisfies our current needs and there is already some contribution from the development community in the form of some really useful plug-ins.

Monday, August 3, 2009

Comparison of Java source code analysis tools

Recently at work, I got the opportunity to evaluate some leading static source code analysis tools for Java. The intent was that a good source code analysis tool would help to improve the quality of the product by detecting problems in the code, well before the QA folks get it.

At expected, there are a plethora of tools in the market - some that are commercially available, whereas others that are provided free with the open-source initiative. I selected Coverity Prevent for Java, one of the leading commercially available tools, in part due to the existing relationship that my present employer has with Coverity. I also decided to pick two leading tools, FindBugs and PMD, from the open-source arena, as they seem to be the most popular tools in use. The selected tools were used to analyze the same Java codebase on a Windows 2008 Server machine with 1.5 GB allocated to the analysis.

Comparing source code analysis tools from different vendors is not a typical apple-to-apple comparison. Each tool has its own inherent strengths and does better than the other in certain areas. Anyhow, following an evaluation methodology that has always worked very well for me, I evaluated these three candidates again the following criteria:

License Cost: An important consideration in any evaluation. Being a commercial product, Coverity Prevent has as associated license cost, whereas the other two are basically free.
Quality of the Analysis: Obviously concurrency and resource ultilization issues are deemed more important that ununsed method / variables. This was a difficult comparison to make, as all the three tools reported a wide spectrum of problems. In general, however, I found the Coverity Prevent and FindBugs analysis to be better than PMD.
Speed of the Analysis: Since the objective is to integrate the analysis with the nightly build, a short analysis time is preferred. While Coverity Prevent took hours to analyze the codebase, both FindBugs and PMD were done in minutes.
Eclipse Integration: Having an Eclipse plugin is essential to report defects during day-to-day development. Fortunately, the three tools selected provide one.
Rule Customization / Extension: The ability to customize the existing rulesets and add new ones was considered a desirable feature. While all three provided the option to add / drop certain rulesets from the anaysis, only FindBugs and PMD allowed the user to create new customized rulesets.
Defect Reporting: This considered the ability of the tools to report the defects in the most intuitive and convenient manner. Coverity Prevent has a great web-based defect manager that allows the user to remotely look at the defects, review the associated source code and act on them. FindBugs has a GUI option that displays the defects and provides associated source markups, however, PMD doesn't provide the source markups and it is only available as a command-line program. All three tools provided an option to export the defects in different output formats (html, xml).

Based on the evaluation criteria given above, we selected FindBugs, which is in line with the wisdom of the web - "If you have never used a source code analysis tool before, try a free one first and see if that works for you". Typically, projects use multiple open-source code analysis tools and I found a lot of references to projects that have used both FindBugs and PMD.

We are now looking for a Java runtime analysis tool and I am doing an evaluation for that. I hope to post the results of that as well.