Using Data Integrity
Ensure that your data is accurate, and find elusive bugs in your applications with SQL Server's data integrity feature.
by Alex Papadimoulis
May 1, 2006
Data integrity (DI) is an important feature in SQL Server. When used properly, it ensures that data is accurate, correct, and valid. It also acts as a trap for otherwise undetectable bugs within your applications. However, DI remains one of the most neglected SQL Server features.
DI is an unfamiliar concept to many developers. Consider, for example, this variable declaration in a strongly-typed language:
Most developers understand that setting ageInYears to "squid" causes an invalid cast exception: A squid is not an integer. However, no error is raised if ageInYears is set to (-185), despite the fact that a negative age is just as absurd as being squid-years-old. Strongly-typed languages are simply not designed for this purpose, but the database is.
In SQL Server, DI is primarily enforced with check constraints. Check constraints are the mechanism that allows for values (for example, ageInYears) to be restricted by business rules (for example, ageInYears is between 18 and 120) instead of just data types (string vs. integer). They work at the table-level to prevent a row from having any invalid data by comparing one or more of its columns in a Boolean expression. When a row is changed and the check evaluates to FALSE, the change is rejected. For example, this is how the ageInYears constraint would look:
ALTER TABLE Applicants
ADD CONSTRAINT CK_Applications_Age
CHECK (AgeInYears BETWEEN 18 AND 120)
Generally speaking, you should use check constraints for "common sense" and infrequently changing business rules. It's much less expensive to remove a check constraint than to fix lots of invalid data. Here are real-life check constraints would have caught some minor, though fairly expensive, bugs:
- CHECK (Loan_Amount <= Property_Appraisal_Amount): This check constraint would've caught the creation of unsellable mortgages ($10,000 plus profit) and tied up several million in investment funds before this bug was discovered.
- CHECK (Quote_Status IN ('Pending','Active','Canceled')): Without this check constraint, several hundred quotes were lost in the system because being in Pending status requires manual reentry.
- CHECK (Borrower_SSN NOT LIKE '%[^0-9]%'): Auditors found that all SSNs in the audit trail after a certain date were truncated with dashes (for example, '123-45-67'); lending operations were halted until the records were manually fixed from paper files. This check constraint would've caught the error.
See the SQL Server Books Online (BOL) for more information on the syntax and limitations of check constraints. The BOL also contain a wealth of information on using some of SQL Server's other DI features. I've listed some of these features here for your reference.
Though most developers are familiar with the concept of foreign keys, many do not use the database to enforce them. A foreign key consists of one or more columns (for example, the Customer_Num column on the Orders table) that relates to the primary key of another table (for example, the Customer_Num column on the Customers table). When the foreign key is defined in SQL Server, the database will ensure the column(s) contain only data in the primary table (for example, Customer_Num must exist in the Customers table before it's used in the Orders table).
Triggers are special stored procedures that run when data is added, updated, or deleted from a table. They allow for a more complex validation than check constraints. Here are some common triggers:
- Allowing only one row to be updated at a time.
- Preventing a document from being changed if its status is Inactive.
- Preventing items from being added to a Shipped order.
User-Defined Types (UDTs) provide a mechanism to keep check constraint logic in one place when you use them across several tables. By defining a UDT (for example, SSN), you can ensure that the rules for a type (for example, 9 characters long or numbers only) are enforced whenever you use it.
Placed on one or more columns in a table, a unique constraint acts like a primary key: It allows only one set of values per table. For example, a unique constraint is useful in a "display sequence" column.
About the Author
Alex Papadimoulis lives in Berea, Ohio. The principal member of Inedo, LLC, he uses his 10 years of IT experience to bring custom software solutions to small- and mid-sized businesses and to help other software development organizations utilize best practices in their products. On the Internet, Alex can usually be found answering questions in various newsgroups and posting some rather interesting real-life examples of how not to program on his Web site TheDailyWTF.com. You can contact Alex directly via email at firstname.lastname@example.org.
Back to top