Tuesday, June 21, 2011

Using Database Constraints

For many developers, a relational database is just a simple data store. They really don't believe is having any constraints in the database, and prefer to implement all checks-and-bounds in the application logic. This is really a recipe for disaster, and in this post, I will look at some key database constraints that should always be set for an application.
  1. Foreign Key: A relational database is called "relational" for a reason - the term signifies the ability to relate tables (and data) with foreign key constraints. A foreign key constraint avoids data integrity problems and prevents programming errors. Having foreign keys is essential for parent-child and other association relationships. A commonly quoted example is the "Purchase Order" and "Line Item" data, where the purchase order id is the foreign key in the line item table. Not having the foreign key relationship between these tables allows for the possibility of having an incorrect purchase order id in the line item table. This value may be inserted mistakenly by the application code or perhaps manually from the SQL prompt, which in turn results in unpredictable errors in the application and perhaps defensive code to check if the purchase order id is correct before performing further processing. It is much easier to avoid these problems by having proper foreign key constraints in the database.
  2. Not Null Constraint: This is another commonly overlooked problem. Even with limited domain knowledge, it is not too difficult to ascertain which columns can never be null and enforce this in the database by adding a "not null" constraint on the column. Again, the avoids the need to have defensive code all over the place that checks if the variable - that stores the data from this column - is not null before using it for further processing.
  3. Unique Constraint: Unique constraints can be placed on a single column, or on a group of columns. To create an unique constraint on a set of columns, it is required to create an index on them and set the index as unique. Without the unique index, it is common for application code to query the database to check if certain key data is already present before inserting a new record with the data. As an example, if the user id in the user table needs to be unique, then it is better to enforce this with a constraint, instead of querying the database to see if the user id is already exists. Attempting to insert with the duplicate user id results in an exception, which can be caught and the user can be requested to select a different user id. There is a significant performance benefit of using the constraint approach, instead of the using the query to check for duplicates.
  4. Using ENUM: Most programming languages support the enum datatype, and now most databases also support enum. Before enums, if a column could have a fixed set of values, then the solution was generally to have integer (or char) constants denote the various values. As an example, if the user account could be 'active' or 'disable', then the 'state' column would generally have integers (0 and 1) to denote the different states, or perhaps characters 'A' or 'D' for the same purpose. With enums, it is possible to enumerate all the values that a column might store, and use the same names in the code so that data comparison becomes easier.
While there may be other useful constraints, I generally find the above-mentioned ones most useful and find it difficult to comprehend how an application can do without them.

Friday, March 18, 2011

Detecting Concurrency Problems using TestNG

With the emphasis on product quality, unit and integration testing is gaining widespread momemtum, and TestNG seems to have become the defacto standard for writing these tests. A great feature in TestNG - that is often missed in all the JUnit vs. TestNG comparisons on the web - is the ability to execute a test in parallel using multiple threads. This feature is pretty useful in detecting concurrency problems in the code. A developer could write a test at the Controller level that executes the Service and DAO (Data Access Object) code, and if this code contains any concurrency constructs for thread-safe access, then the TestNG threads will detect concurrency problems such as race conditions and deadlocks.

Using the TestNG framework, it is relatively easy to specify that a given test be executed in parallel. This can be done using additional parameters to the @Test annotation.

As an example, consider the following test:

@Test (threadPoolSize = 3, invocationCount = 9, timeOut = 1000)
public void myTest() {
// write your test here
}

The threadPoolSize determines the number of threads that are used to execute the tests, and the invocationCount determines the total number of times that the test is executed. The timeOut parameter - 1 second in the above example - guarantees that none of the threads will block on the others, in effect avoiding a deadlock. However, this is not something that a developer should try to avoid, because if there is a deadlock, it is better to detect it sooner that later. Therefore, in general, it is better of omit the timeOut parameter.

With the advent of these super-easy testing frameworks and the value-proposition that they bring, most developers are jumping on the unit/integration testing bandwagon. This is great for the software engineering field in general, as eventually some day bugs will not be considered as the norm in a software product.

Wednesday, February 16, 2011

Ant build under Eclipse - Error running javac.exe compiler

I ran into this problem twice, where the ant build script executes perfectly under the command prompt, but fails under Eclipse with the message - "Error running javac.exe compiler". It took me a while to figure out the problem, and it turns out that the problem arises from the javac task in ant. If the task contains an attribute fork="yes", then ant tries to spawn a new instance of javac to compile the source. Now if the referenced Java library in the project is a JRE, Ant can't find javac and therefore gives the above-mentioned error. If the parameter is changed to fork="no", then the build completes successfully, but a better solution is to include the JDK - and not the default JRE - as the referenced library for the project.