Tuesday, June 21, 2011

Using Database Constraints

For many developers, a relational database is just a simple data store. They really don't believe is having any constraints in the database, and prefer to implement all checks-and-bounds in the application logic. This is really a recipe for disaster, and in this post, I will look at some key database constraints that should always be set for an application.
  1. Foreign Key: A relational database is called "relational" for a reason - the term signifies the ability to relate tables (and data) with foreign key constraints. A foreign key constraint avoids data integrity problems and prevents programming errors. Having foreign keys is essential for parent-child and other association relationships. A commonly quoted example is the "Purchase Order" and "Line Item" data, where the purchase order id is the foreign key in the line item table. Not having the foreign key relationship between these tables allows for the possibility of having an incorrect purchase order id in the line item table. This value may be inserted mistakenly by the application code or perhaps manually from the SQL prompt, which in turn results in unpredictable errors in the application and perhaps defensive code to check if the purchase order id is correct before performing further processing. It is much easier to avoid these problems by having proper foreign key constraints in the database.
  2. Not Null Constraint: This is another commonly overlooked problem. Even with limited domain knowledge, it is not too difficult to ascertain which columns can never be null and enforce this in the database by adding a "not null" constraint on the column. Again, the avoids the need to have defensive code all over the place that checks if the variable - that stores the data from this column - is not null before using it for further processing.
  3. Unique Constraint: Unique constraints can be placed on a single column, or on a group of columns. To create an unique constraint on a set of columns, it is required to create an index on them and set the index as unique. Without the unique index, it is common for application code to query the database to check if certain key data is already present before inserting a new record with the data. As an example, if the user id in the user table needs to be unique, then it is better to enforce this with a constraint, instead of querying the database to see if the user id is already exists. Attempting to insert with the duplicate user id results in an exception, which can be caught and the user can be requested to select a different user id. There is a significant performance benefit of using the constraint approach, instead of the using the query to check for duplicates.
  4. Using ENUM: Most programming languages support the enum datatype, and now most databases also support enum. Before enums, if a column could have a fixed set of values, then the solution was generally to have integer (or char) constants denote the various values. As an example, if the user account could be 'active' or 'disable', then the 'state' column would generally have integers (0 and 1) to denote the different states, or perhaps characters 'A' or 'D' for the same purpose. With enums, it is possible to enumerate all the values that a column might store, and use the same names in the code so that data comparison becomes easier.
While there may be other useful constraints, I generally find the above-mentioned ones most useful and find it difficult to comprehend how an application can do without them.

No comments:

Post a Comment