DB2 General Information
- What is a database constraint?
- Why indexes?
- What are the rules for keys?
- What are the limitations?
- What are nulls and when do I use them?
- Why are database statistics important?
- What is a database trigger?
- What is a database view?
The answers to all of these may be found in the SQL and Administrative Guides for DB2 . Unfortunately, the manuals are not always easy to follow. As database administrators, programmers, and systems designers it is imperative to have a good grasp of each. We have attempted to describe some of these in layman terms.
DB2 Database Constraints- Code does not have to be re-written for every application which touches the table thus saving time and oversight.
- You ensure security, since someone cannot write a piece of code which might adversely modify your data.
- You speed up database processing particuarly in a client-server environment, because your edits are running on the database server.
- The edits become self-documenting, since they may be pulled directly from the database.
Several methods are available in DB2 to accomplish constraint control. Constraints may be on a single table, a column, one column against another column, or even verifying one table's data corresponds with anothers. Here is a listing of possible constraints.
Null or Not Null Columns - Defines whether nulls are allowed on a given column. Cannot be used as if the column is part of a primary key. As a general rule, avoid use of nulls where possible.
Column Defaults - Default values allow you to pre-define a value for a column unless overridden. These are specified at table creation or alter table time. If no value is specified for the default, it becomes the number zero for numbers and a blank for character data. Date type functions default to the current date, time or timestamp if not specified. If the DEFAULT keyword is not used, the default is null.
Unique Indexes - Specifies a column or set of columns must be unique within the given table. Up to sixteen columns and 255 bytes may be included in a unique index. Indexes normally speed up data retrievals (selects), but slow down updates, deletes and inserts. Take care not to have to many indexes on a table. DO NOT confuse indexes with primary keys.
Primary Key - A primary key is similar to a unique index in that, a unique index is automatically created for the primary key. It cannot contain null values. It can however, be used to define the table as a parent table. By using the combination of a parent table and a foreign key (child table, you can guarantee there exists a parent row for every row in the child table. Case in point. Say you have a master table (parent with primary key SSN) which includes names and address of all of your patients. A second table contains a listing of office visits of your patients. You have a common identifier such as social security number (SSN) in both tables and have identified SSN as a foreign key in the patient visit table. The patient visit table would of little use if someone deleted the name, address information in the master table. The combination of the Primary and Foreign keys guarantees you will not be able to change or delete the master patient table's SSN value as long as a corresponding child patient visit record exists. It also works the other way, you cannot add or change the child patient visit table record unless a corresponding record exists in the master patient table. DB2 also understands you may modify this behavior and allows this through the ON DELETE and ON UPDATE options.
Check Constraints - Check Constraints are a good example of code replacement. You can check for values in a given set of values such as sex in ("male", "female"), a range of values such as cost < 1000 or against another column in the same row, for instance dependent_name = parent_name.
Foreign Key - Again, foreign keys define a relationship between a child and parent table. See the Parent Key section for a description of its use.
Null and Not Null - Nulls are defined as the absence of data. Columns are defined as null/not null at "create table" or "alter table" times. They are not allowed in primary keys or unique indexes. They are not evaluated in computations and cannot be comparied to each other. Nulls must be checked with the IS NULL command to be found. As a general rule, avoid use of nulls wherever possible.
- If column A is null and column B is null, column A does not equal column B.
- If you have ten records, four containing 10, four containing 20 and two containing null, the average is: (10*4 + 20*4)/8 = 12, not (10*4 + 20*4 + 2*0)/10 = 15.
- Given the same ten records, if you ask for all records with a value greater than -10, you would only retrieve (select) eight records. To get all ten records you would need to ask for all records greater than -10 or IS NULL.
Now apply a primary key, which in effect creates a unique index and identifies
a column or columns as a parent to another table. DB2 will now know where
to look and only access those portions of the table needed based upon
where the index indicates the data resides. Again, primary keys create
a default unique index. You may also create specific indexes via the create
index statement.
The downside of indexes is that, although they identify where data resides
in the main table, they are a another table themselves. As you add new
indexes to the main table, you are in effect creating a sub-table for
each index. During data update, instead of just updating the main table,
you will find the system needs to update each table sub-index as well.
For this reason, limit your indexes to no more than about three per table
not counting your primary keys.
- Keys should be defined with the most unique column
values first. For instance: A key is defined as the two columns
"last_name and gender". If you have 100 records(rows) of last
names, the chances of a repeating last name are fairly low. Gender,
on the other hand may consist of only two values such as "Male or
"Female". Since character based keys are always evaluated from left
to right, when gender is used as the first part of the key, the
system would always have to read the entire gender column before
even beginning to look for the last_name column. In the following
example, each character is a byte read. If you look for Raughton
Male, using the composite (multi-column) key, the second byte is
unique enough for DB2 to determine you want to find the record Raughton
versus Robinson. If the columns were reversed and you wanted Raughton
Male, DB2 would need to read six bytes (MaleRa) to determine where
the record is.
Last_Name Gender Composite Key Reversed Composite Key Bostick Male BostickMale MaleBostick Zhang Male ZhangMale MaleZhang Raughton Male RaughtonMale MaleRaughton Robinson Female RobinsonFemale FemaleRobinson Brooks Female BrooksFemale FemaleBrooks
- Number keys are more efficient than character keys. Numbers are simply easier to evaluate than character data. A two byte small integer can provide keys from -32768 to +32767 versus two characters as a two byte value. Additionally, numbers are grabbed as group and processed through numeric algorithms. An excellent key is often a nine digit four byte social security number. Many systems will store the SSN along with Last Name, First Name, Address and use SSN to join another table.
- Nullable columns ARE NOT good keys. Since a null indicates the absence of data, it cannot be evaluated.
- Primary keys are used to identify a table as a parent(master) table to another child(dependent) table. These are used to guarantee master records (rows) exist for every dependent record. A unique index is automatically created for every Primary key. Primary keys cannot contain null values and as such uniquely identify the record.
- Foreign keys are used to link a child table to a master table. Typically, there will be many child records for each master record.
| Database Name | 8 BYTES |
| Table Name | 18 BYTES |
| Column Name | 18 BYTES - 8 recommended |
| View Name | 18 BYTES |
| Index Name | 18 BYTES - 8 recommended |
| Number of Base Tables per View | Limited by Storage |
| Number of Columns per Table | 1012 (4K Page) |
| Maximum Number of Columns per Index | 16 |
DB2 also allows NULLS, but does not allowed you to define NO DUPLICATES on a table. It also does not have a MONEY datatype, but rather uses DECIMAL. Some datatypes allowed are:
| Small Integer | -32768 to +32767 |
| Integer | -2147483648 to +2147483647 |
| Double, Float | 2.225E-307 to 1.79769E+307 |
| Decimal | -10E31+1 to 10E31-1 |
| Char | 254 BYTES |
| Varchar | 4000 BYTES |
| Date | Year, Month, Day |
| Time | Hour, Minute, Second |
| Timestamp | Date + Time + Microsecond |
Additionally, DB2 supports Binary Large Objects (BLOBS), Character Large Objects (CLOBS) and user defined datatypes (UDT). CLOBS and BLOBS are used mainly for pictures, videos, and sound. UDT's would be used to define a datatype like COLOR in which you would create allowable colors. Care should be be taken with all of these, especially UDT's where you will also need to define all the operations allowed on the UDT.
Nullable columns should not be used as part of a key or index. If you later decide you want to add an index on the column, nulls will most likely prevent this particuarly for unique indexes. Modification of the data to convert the nulls to another value is difficult and time consuming. Nulls are useless to the DB2 optimizer and cannot be used to evaluate the most efficient way to retrieve your data.
As a general rule, DO NOT USE NULLS unless absolutely necessary. Their usefulness is often questionable and they certainly require explicit code for proper usage.
Without some way of counting how many Smith's or Male's, DB2 or any other database engine does not have enough information to optimize the query for peak performance. Typically, you would generate statistics on key and index fields with the RUNSTATS command. Statistics should be ran after:
- Table Loads
- Significant adds, deletes, and update statements
- Addition of new indexes
- On a routine basis
- Stop repetitive coding.
- Guarantee operations occur.
- Speed up updates, deletes, and inserts of data.
If used improperly, they can
- Lock up your database.
- Be difficult to track and debug.
Basically, a trigger is a piece of code which performs some operation either before or after an insert, delete or update of data occurs on a table. It is written with SQL and added to the database server via the "Create Trigger" command. A couple of examples might be:
- Before giving a employee a raise, check to see that it does not exceed a given value or percentage. If so, send a message to the supervisor that John Doe has just been given a 150 percent raise.
- After every new payment entry, add the value to the total payment table. This is a good example of keeping a summary table from which to generate summation reports without reading and summing all the rows in a given detail table.
- A SSN was erroneously entered, when the entry is corrected, update all other tables with the correct SSN.
- Every time a new payment entry is entered, sum all payments and update the summary table. This in effect is the same as the previous sum example, but you DO NOT want to do this. Although a valid trigger sequence, can you imagine forcing DB2 to read every row in a several hundred thousand row table every time up add a record?
I mentioned that triggers are difficult to debug and may lock up the database. Since the triggers are stored in the database, there are system catalog tables which list the trigger and the tables upon which they operate. The difficulty lies both in the facts that:
- The trigger code is not in the application which a programmer is coding, so they may not be aware of it.
- A trigger will "fire/execute", regardless of where a data change command is executed. For instance, say you have a trigger defined to send a message to the supervisor everytime an employee changes departments. Now your department name changes from "University Computing" to "University Computing Services" and you have fifty employees. As DBA, you issue a statement to update the department name field using SQL. Do you really want your boss to get fifty mail messages indicating a change of department for each employee?
- A trigger may be "fired" from other triggers (cascading). A delete on a table row, fires a trigger which appends data to a second table, where an append trigger sends a message when a row is appended to the second table. This can go on and on cascading many operations based on the preliminary "delete" statement.
Triggers are definitely something you want to use, but with caution. They can do all of the right things from reducing code, to guaranteeing data integrity, to speeding up the entire system.
A database VIEW as the name implies is simply another way to group, restrict, manipulate and display data. The view itself contains no actual data. It could almost be thought of as another way to display data from underlying tables. Take the following table structure for your employee table:
Last_Name
First_Name
Address
Phone
Public_Access
Salary
Now, let's say you want to allow anyone to search for an employees phone and address, but you also want employees to control whether the general public can see their phone number. Also, you don't want to display the Social Security Number or salary to the general populace. You can create a VIEW of the EMPLOYEE table to list only the fields of interest as well as control access.
select last_name, first_name, address, phone
from employee
where public_access = 'Y'
The view employee_view can now be queried and data updated the same as a normal table. What is gained, is instead of storing another table or creating a special query for the field conditions, you can create the view and simply select data from the view.
Views are extremely useful, not costing any performance issues. Consider using views when creating reports, joining tables, combining data through unions, or restricting field displays. Since views also require applicable permits, it is not inconceivable to use the above table, grant public access to the view, but restrict access to the underlying table. In the above example, you might even allow individuals to update their address and phone through the view, but they will not being able to change their salary or SSN.
