Tuesday, December 14, 2010

Concurrency - Optimistic vs. Pessimistic Approach

Whenever developers think of concurrency, the first thing that comes to their mind is semaphores and mutex that provide serial access to a critical section of code. Most languages provide an extensive API for thread synchronization and very often folks just start using the synchronization primitive without much thought into the what they are trying to accomplish. As an example, the most abused concurrency primitive is the "synchronized" keyword provided by Java, which is often put anywhere and everywhere that a developer feels that there is a possibility of concurrent access. "Synchronized" is a monitor, and as such, it doesn't require explicit lock and release statements, as a semaphore or mutex would. This is why perhaps people generally add the synchronized keyword to methods, whenever they feel that the method does something that needs protection from concurrent access. It is not uncommon to come across instances of deeply nested method calls, with each method having a synchronized keyword in the declaration. Synchronized implicitly obtains and releases a lock on the object every time the method with the modifier is called. This is a computationally intensive operation that makes the application slower than it needs to be. Now Java 5 provides some powerful concurrency primitives, but before jumping the bandwagon and starting to use those primitives all over the code, it is better to evaluate the concurrency needs of the application that is being built.

There are generally two approaches to handle concurrency in a software program, each with its pros and cons. An engineering team should consider and evaluate both approaches and decide to use either one, or both, based on the needs of the product they are building. The two approaches are:

Optimistic approach: In this approach, there are no semaphores or mutex to protect a critical section of code that handles the shared data. There is a master copy of the shared data, with each thread getting a local copy to work on. When a thread wishes to update its local copy of the data, the local copy is compared with the master copy to ascertain if the data has been modified since it was last read by the thread. If not, then the update is successful; however, if the data has indeed been modified, then a concurrent modification exception is thrown and the user is expected to re-apply the modifications on the new copy of the data. This approach is common in databases, and it is also used by Java for collections that are not thread-safe by default (HashMap, HashSet, ArrayList).

Pros:
  1. Due to the absence of semaphores and mutex, the application exhibits better performance and scalability.
  2. The modifications of the first thread that performs the update are persisted, whereas the other threads are informed of the change in data and requested to repeat the update on the modified data.
Cons:
  1. User may need to perform the modifications again, if another thread updates the data after it was read by the user thread. This may cause frustration in a multi-user heavy-transaction environment.

Pessimistic approach: This approach requires the use of a semaphore, mutex or monitor to ensure serial access to a critical section in code. In this approach, a single copy of the data is maintained and serial access is provided to threads requesting access to this data. When a given thread enters the critical section, no other thread is allowed to access this data until the thread exits the critical section.

Pros:
  1. Suitable for situations where there is no shared data, however, serial access need to be provided to a shared resource, such as a socket.
Cons:
  1. If the semaphore or mutex is not released properly, it leads to a memory leak. This degrades the application performance over time.
  2. Another problem with semaphores and mutex is the possibility of a deadlock, which occurs when a circular dependency is introduced between two threads, each requesting a lock on a resource that is currently held by the other.
  3. Since serial access is provided to concurrently executing threads that wish to update shared data, the changes made by the last thread are persisted, whereas the other threads are unaware of what happened to their modifications.
A given application may use either one, or both, of the above-mentioned approaches. For shared data access among multiple threads, it is preferable to use the optimistic approach, whereas, for shared resource access (socket etc.), it is generally better to use the pessimistic approach.