NoSQL Improved Upon Record Based Databases

NoSQL databases (known as key/value, key/JSON, or “document” database) provide data that is self-describing, which is a huge improvement compared to older “record-based” databases. NoSQL databases are often compared to (purely) relational databases, but this is primarily because relational databases hold the biggest market. Record-based databases are really a non-entity these days, so there wouldn’t be much benefit in trying to “compete” against them.

The best comparison of NoSQL is against record-based databases.

Record Based Databases

Old fashioned “record-based” databases contained the data, while the structure of data was contained within the application. The database would contain a set of data values such as the following:

123, Chris, Craft, 123-555-1212

The application would know that these data attributes were as follows:

Record Key, First Name, Last Name, Phone Number

The application would typically contain a structure definition for the language the application was written in. If you go back far enough in history, this would have been Assembler or possibly COBOL language. The C/C++ language(s) would have used a struct contained in a header file.

This was fine in the days when ALL of the application code that touched the data was written in the same language. It was a huge pain when multiple applications accessed a common record-based database using different languages, such as a COBOL application that created data, and a C/C++ or Java application needed to access the same data. Someone needed to maintain structure definitions that were in-sync with each other, and it was a huge problem when changes were made to the structure.

Record-based databases included things like VSAM and ISAM. Neither of those databases used SQL, so they would certainly fit the definition of a NoSQL database. I realize the NoSQL people in this world aren’t thinking of VSAM and ISAM to be part of their orbit. Of course that’s a basic problem with the “NoSQL” term itself. Anything defined by what it’s not is very poorly defined.

Self-Describing Data

NoSQL databases typically contain self-describing data. JSON (Java Script Object Notation) is the most commonly used form of self-describing data, but there are other forms as well including XML. Rather than describing data in a struct within the application code, those structure definitions are kept within the data. The database provides a single source of truth for data structure, but also allows for much simpler modification of the data structure and code that access the data as the application and database evolve over time.

A JSON structure matching the previous example might look like this:

{
    "recordkey":"123",
    "firstname":"Chris",
    "lastname":"Craft",
    "phonenumber":"123-555-1212"
}

Of course every single record contains the field names, which consume more storage than the data in this example. Storage is cheap these days and we have much greater processing power than years ago, so there isn’t as much “data pressure” that would force us to define the structure inside the application. The reduction of “data pressure” allows us to use a method that’s less storage efficient to gain advantages in the application development side.

Data Pressure Considerations

The term “data pressure” essentially refers to the cost of storage and the cost of processing larger and larger volumes of data.  Record-based database were extremely efficient in terms of the space they required and compute power to process that data. Less data volume meant less data store and less data to process, so nobody would have considered making data self-describing.

Record-based databases were never self-describing because that simply was not practical decades ago. The cost of storage was simply too high and putting data descriptions into the data itself would have resulted in a huge amount of data duplication (duplication of the data definitions stored within each record), and there would not have been enough compute power on earth to work with that data.

Comparing NoSQL to Relational

NoSQL databases are more about the structure and representation of data than about the  absence of SQL language. Part of this is the same issue of object/relational impedance mismatch that has been around for decades. The rise of object-oriented programming resulted in application code that simply didn’t fit well with data modeled relationally.

Oracle Database supports Key/JSON

The SQL language does not work only with relational data. We can certainly use SQL to access a 2 column table containing a KEY (column #1) and JSON structure (column #2). Oracle Database includes the ability to model data in key/JSON format. Oracle databases might contain nothing but key/JSON data, some relational and some key/JSON, or JSON can be used to extend a relational model.

Oracle is a Converged Database

Oracle Database is a called a “converged” database because it supports virtually any data modeling approach and supports virtually any workload, all within a single database or deployed as separate databases.  Oracle is a multi-model database that supports relational, graph, time series, key/JSON, row-oriented, column-oriented (on Exadata), Geospatial data, normalized models, dimensional models (star schema, snowflake schema, etc.) and others.  Oracle Database supports all workload types including OLTP, DW, and IoT within a single database.

Oracle database uses the SQL language to process what some consider to be “NoSQL” format data (like JSON).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s