All Data Has Relationships

All data has relationships to other data. You can easily see this by simply looking at any piece of data.  All data has relationships to other data such as:

  • Customers place Orders
  • Orders are for Products
  • Products are in Warehouses
  • Warehouses have Employees

The list of relationships goes on and on, ad infinitum. Ensuring relationships are tracked is critical to data integrity. What happens if a customer orders a product that doesn’t exist? What happens if products go missing from a warehouse?

Databases Simplify Applications

The raison d’être for databases has always been to simplify application code. Anything done by the database is something the application code doesn’t have to handle. The basic functions of a database through the history of IT has been these:

  • Preventing Data Loss
  • Controlling multiple user access to data
  • Enforcing data values are correct
  • Ensuring data relationships are valid
  • Data Security

These functions have been done by databases for decades because it simplifies application code.

Preventing Data Loss

Databases protect data, which is the lifeblood of an organization. Preventing loss of data has been key to survival even in the days of paper records and stone tablets. Electronic records are subjected to loss from a wide variety of failures, and databases are responsible for providing protection against these failures. Data loss might mean loss of the ENTIRE database, but also means changes from one user getting destroyed by another user.

Controlling Multiple User Access to Data

Without databases, applications would face extreme complexity in controlling access to data by multiple users. Access to data includes reading data, but also includes changing that data. Applications would be horribly complex without the concurrency controls provided by databases. Users also shouldn’t have access to changes made by others until those changes are complete.

Enforcing Data Values are Correct

The correctness of data values includes enforcing number attributes contain numeric values, date attributes contain valid dates, JSON strings contain valid JSON, GPS locations are valid locations, countries are valid countries, etc. Without this enforcement by the database, errors in application code will introduce corrupt data, which will have a cascading impact on other application code. If the database contains an order date of February 30th, that’s clearly wrong and it’s difficult to determine how that value got inserted into the database. It’s much better to catch that when it happens and prevent the change.

Ensuring Data Relationships are Valid

Relationships represent a special class of data values that are enforced by the database. This goes to the heart of why databases were created, and why the relational database became the leader. Hierarchical databases could only enforce certain data relationships. Network databases were a little better, but also had limitations. Relational databases (invented by Edgar F. Codd) provided much greater capability to model data relationships and ensure those relationships were reflected in the data.

Data Security

Of course databases play a central role in the security of data, including who can view data as well as who can make changes to data. This includes not simply access to the entire database, but down to individual attributes as well. There are different access controls over who can view customers, who can change customer data, and who can view or change customer addresses and credit cards.

All of this complexity can be pushed into the application code, but we often want separate data access controls enforced by the database. The principal is known as “defense in depth”. Fort Knox has the gold inside a vault, inside a building, surrounded by a fence, patrolled by guards. You certainly wouldn’t want a Fort Knox with the gold sitting in the middle of a field with guards running around trying to protect it. Defense. In. Depth.

Normalization & Denormalization

Data often begins in a denormalized form or at least we often start to think about data that’s not normalized, so we typically “normalize” the data when designing a relational data model. We then “de-normalize” data primarily for performance reasons. One example might be phone numbers. Customers often have multiple phone numbers such as home, work, and cellular phones.

In a normalized data model, each of a customers’ phone numbers would be stored in separate rows of a phone number table related to the customer. However, that means accessing extra rows of data when you want to show them all. To improve performance of accessing ALL of the rows, we might simply put separate home, work, and cellular phone numbers on the customer table. However, what happens when a customer has 2 cell phones?

Phone numbers are a great case for the combination of JSON within an otherwise relational database.  Rather than modeling N number of phone numbers in the customer table, we might create a PHONE_NUMBERS column as a JSON structure.  That design allows a variable number of phone numbers for each customer.

Database Optimization

I heard someone say recently that relational databases aren’t optimized for anything, which is a complete misrepresentation of reality. Every table in a normalized relational data model should have a primary key. That primary key will have an index. Access to a row of data through the index is “optimized” and will give good performance.

The database “engine” does not provide the logical structure of data. Database “engines” (historically) have not automatically generated indexes beyond those needed to enforce constraints such as primary keys and unique keys. Building of secondary indexes (those beyond what’s needed to enforce constraints) has always been the responsibility of the database administrator, data modeler, or application developer. So yes, in a sense the database is not “optimized” out of the box if you don’t optimize it!

Oracle Autonomous Database is the first on the market that is doing this sort of optimization and automatically handling what has traditionally been the responsibility of the DBA, data modeler, or application developer.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s