Does the emergence of the Cloud mean that “purpose-built” databases make more sense, or is an All-Inclusive database still the best choice?
There are multiple purpose-built databases on the market that handle specific types of data, specific workloads, or have some specialized functionality. Examples of purpose-built databases include:
- Relational OLTP Databases
- Relational Data Warehouse Databases
- Key/Value Databases
- Time Series Databases
- In-Memory Databases
- Graph Databases
Of course there are other examples, but these are the most common. The majority of OLTP databases are relational, and relational data modeling is the norm for Data Warehouse databases as well. Key/value databases have gotten more popular in OLTP applications that involve high frequency “store & retrieve” operations on relatively simple and often dynamic structures. Key/Value databases (using JSON or XML structures) have the advantage of a non-relational data model that can be quite dynamic. Time Series databases have emerged with a focus on forecasting. In-Memory databases deliver high performance, and Property Graph databases are focused on relationship attributes.
In the Cloud, organizations can easily deploy multiple divergent technologies, so does it make more sense to use purpose-built databases in the Cloud? In an on-premise environment, deploying another database means buying and deploying yet another server. Some Cloud vendors will also provide operational support or a “managed service” for each database, so why not?
Of course it’s not only a matter of deploying the database itself. Each divergent database engine will require it’s own client-side drivers to establish connectivity. Each of those clients will require a tie-in to any centralized Identity Management service as well.
In years past (and with on-premise data center environments currently), OPERATING multiple purpose-built databases was always a big disadvantage. Each database engine requires an IT organization to build expertise in that engine, deploy servers and storage to run those databases, as well as secure and maintain them all. Moving to a Cloud environment, some of the operational burden still falls on the IT organization, even if the database is operated as a “managed service” provided by the Cloud vendor.
Each separate database technology represents yet another technology that needs to be secured. Each database has a different security model, with it’s own unique capabilities and quirks. Each database can be expected to have it’s own set of security flaws and bugs that need to be tracked and managed for proper compliance. Even if the Cloud vendor is responsible for applying security patches, the IT organization needs to be aware of the flaws and making sure vulnerabilities are addressed. Trust your Cloud vendor, but VERIFY they are taking care of security!
Building Developer Skills
Each divergent database technology requires specialized developer skills. Developers need to build expertise in the data modeling capabilities, the functional capabilities, and the business theories that are involved. In the case of analytics, this often includes expertise in statistical analysis such as auto-regressive modeling for time-series forecasting.
Islands of Data
Multiple divergent database technologies results in islands of data that often needs to be integrated. Movement of data between databases takes time and induces lag. If operational reports are executed only from a Data Warehouse, those reports will be incorrect until data from the OLTP system reaches the Data Warehouse.
Islands of data become a big problem if it’s sub-divided too much, becoming a series of data islands. Deploying multiple read-replicas of data also isn’t always a solution to every situation, since those replicas need to be kept updated. Read Replicas are a great solution for data with effectivity dates, but a very poor solution for data with frequent updates.
Multiple islands of data ultimately induces delays in data updates across applications, modules, or even micro-services if each is using a separate data store. The most obvious case is the time-delay induced by ETL processing between OLTP systems and Data Warehouse systems. The time-delay is fine for long-term analytics, but unacceptable for operational reporting.
Islands of Functionality
Of course each purpose-built database includes a set of specific functionality. If a key/value database is used for the CUSTOMER ORDERS database, it’s a problem if some relational modeling capabilities are needed in that database.
Key/Value databases have been presented as if they were something new, but are decades old. Key/value databases have always been FAST for ingesting data because they are simply storing a structure in a single “write” (or insert) operation. Some of the earliest databases used a key/value modeling approach, which becomes extremely cumbersome with highly complex data relationships. What’s new in key/value is the ability to model data structures using JSON or XML, including the ability to ENFORCE the JSON or XML structural integrity.
All-Inclusive Database Approach
The All-Inclusive database approach puts all of this functionality into a single database engine. Developers can use ANY of the available functionality within ANY database instead of having to move the data to a place where the functionality resides. Developers simply use the functionality in-place.
As the following diagram illustrates, Oracle includes ALL of this functionality in a single database engine. Developers working on an OLTP database can use capabilities of Data Warehouse databases within the same database. The data doesn’t have to be moved to take advantage of such features. If data nee
One Engine, Not One Database!
Notice that this means ONE ENGINE, not one database! Just because you could put all of your data into a single database doesn’t mean you should. In fact, you could use separate Oracle databases for each type of data & functionality as follows:
Of course this approach doesn’t necessarily make sense either. Nobody does data architecture like this. Each database is created to handle specific applications or sub-components of an application. Developers then use whatever functionality makes sense to handle the business needs of that application or sub-component.
Data Architecture for an organization approaches the problem from the standpoint of business processes and data ownership. Data architecture should not include limits on data modeling capabilities or limits on how the data can be processed. Consider the following sample data architecture:
Notice that Active Data Guard is used for most databases to provide availability, as well as for Read Replicas in the case of the Product Catalog. Active Data Guard is even used for the ORDERS database, even though it’s a sharded database. While sharding reduces the failure impact (or “blast area”) of the database, each shard still represents a single source of truth for the range data it contains.
If this data architecture included limitations in functionality, the entire architecture gets much more complex. For example, the ORDERS database mainly uses key/value data modeling to increase agility into the data model, as well as improving response time. Submitting an order involves a single WRITE (or INSERT) of a single record that contains a complex JSON structure. Processing each order means a single READ (or SELECT) of a single row of data containing that JSON structure. However, if we need a bit of relational functionality within this database, that’s certainly possible. The main function of the database uses key/value, but developers aren’t limited to just that single data modeling capability.
All-Inclusive in the Cloud
The Oracle database has always been a rather LARGE chunk of code. It can be hard to install and takes up lots of space. However, once it’s up and running, Oracle has enormous depth in capability. My original assumption was that Cloud might tend to favor purpose-built databases. Of course you will hear that from vendors who lack an all-inclusive database engine of their own. However, it turns out that Cloud still does favor the All-Inclusive approach.
Islands of data & functionality is still a disadvantage, even in the Cloud.