12 Years Later, an Exadata 4X Larger than the biggest Amazon can offer

I ran across this photo from 12 years ago and got to thinking it’s still more than 4X larger than the largest system you can construct for an Oracle database on Amazon’s RDS (Relational Database Service). This is a single Exadata Version 1 system spanning 3 data center racks.

Exadata Version 1 – Triple Rack System

This was one of the first systems I had seen at a customer site and it was a triple-rack configuration.

  • 3 Data Center racks wide
  • 528 CPU cores
  • 504 disk drives
  • 226 Terabytes of storage

Of course the latest Exadata systems are much more powerful than this, but it’s stunning when you realize that Amazon cannot match even 1/4 of this machine from 12 years ago. The largest Oracle database you can run in Amazon is around 50 Terabytes. The largest number of CPU cores you can have in Amazon is 64 cores, but that system (db.x1.32xlarge) will cost MORE than Exadata Cloud Service!

Microservices, Sharding and Consolidation with Oracle Database

As the #1 database leader overall, Oracle is also the best database for microservices architectures and the best database for sharding. Microservices and Sharding are application development and data modeling decisions. Whether to consolidate those databases and how to consolidate them is a “macro-level” infrastructure management decision. In this article, we will explore these topics:

  • Microservices Design Pattern: Database Per Service
  • Sharding: Subdividing a Large Database into Shards
  • Why Isolate Databases?
  • How to Isolate Databases
  • Don’t Over-Isolate: Limit Virtual Sprawl
  • Stop Noisy Neighbors: Use Resource Manager
  • Use Resource Shapes for Simplicity
  • Reducing Blast Area Still Means a Blast!
  • HA is Still Important With Microservices & Sharding
  • Data Driven Apps: Benefits of Converged Database

Oracle is a Converged Database, providing the full range of database capabilities for modern Data Driven Applications. Oracle Database provides industry leading capabilities that exceed what point-solution databases can deliver.  You can deploy Oracle Databases as separate, dedicated databases supporting specific applications or microservices, or consolidate them to make more efficient use of system resources and administrative staff. Oracle Database can scale much larger than other databases, but can also scale-down to handle even the smallest microservices or sharding needs.

Microservices Design Pattern: Database Per Service

The Database Per Service Design Pattern says that each microservice has it’s own “database”, which means a collection of data and does not necessarily refer to a dedicated Oracle Database. In fact, (as outlined by Chris Richardson here) there are 3 ways for each microservice to have it’s own set of data as follows:

  • Private Tables per Service
  • Schema per Service
  • Oracle Database per Service

Each microservice has it’s own set of data, and that data is only accessed through API’s belonging to the microservice. Whether a particular microservice has a set of tables, a dedicated schema, or an entire dedicated database generally depends on the SIZE of data, volume of activity and need for isolation. Microservices with larger amounts of data will tend to have their own databases, whereas smaller amounts of data and activity would argue for a dedicated schema or even simply a number of tables within a schema. Of course an Oracle Database can be a stand-alone database, but multiple databases can also be combined together as Pluggable Databases within a Container Database. We will look at this and other isolation topics in later sections of this post.

Sharding: Sub-Dividing a Large Database into Shards

I have worked on multiple applications where we sub-divided a massive logical database into many physical Oracle databases going back to the 1990s. The sharding concept is really nothing new, but Oracle Database now offers Sharding as built-in, out-of-the-box services that make building sharded databases and applications much simpler. Sharding features are built into the Oracle Database so that application developers don’t need to implement those features in application code.

At the end of the day, when you shard a database into multiple smaller pieces, you still have the same amount of data. If you shard a 70TB database into 10 shards of 7TB each, it’s still 70TB. If you shard that data into 100 shards, it’s still 70TB. Oracle is the KING of managing massive amounts of data, and it doesn’t matter if that’s 1 database or 100.

The point of sharding is that each database will be smaller and therefore easier to operate. You can then isolate those databases in order to improve operations.  Let’s look at WHY it’s sometimes necessary to isolate databases, then we will look at HOW to isolate them.

Why to Isolate Databases

Multiple microservices can reside within a single database, or might have dedicated databases (or Pluggable Databases within Containers), while shards are always spread across databases. Once microservice databases and shards are defined at the database layer, there are 5 factors that govern why those databases need to be isolated vs. consolidated as follows:

  • Administrative Separation
  • Security Separation
  • Maintenance (patching and upgrades)
  • Blast Radius (scope of failure impact)
  • Resource Management

Administrative Separation refers to organizations that have multiple DBA teams or other needs to separate databases from an administrative standpoint. This is also common in SaaS (Software as a Service) providers, where customers of the service provider have administrative access, or the service provider has separate teams that administer databases. In either case, the multiple administrative teams means those databases need to be isolated from each other.

Security Separation refers to use of dedicated networking infrastructure to service specific databases (quarantine LAN, etc.), and subjecting certain databases to higher security standards such as for regulatory reasons (HIPAA, PCI, PII, etc.). Databases must be isolated in cases where security requirements differ widely between them.

Maintenance considerations include patching and upgrades of servers, databases, and any supporting infrastructure. As microservices are consolidated together more closely, any maintenance must be coordinated among those microservices and testing is often required to ensure compatibility. The biggest concern with maintenance is major version upgrades at the infrastructure layer (O/S, database, etc.), where there is greater potential for functional changes that might impact how services operate.

Blast radius (or fault isolation) considerations refer to the scope of impact any failure has. An outage of any single microservice can jeopardize the function of the application as a whole. However, smaller databases can be repaired more quickly than larger database, so limiting the scope of impact can result in faster service restoration during a failure.

Resource Management refers to ensuring each database receives the resources it requires, as well as guarding against the “noisy neighbor” problem. One approach to resource management is to isolate databases onto dedicated physical or virtual machines, but we recommend to simply use Oracle Resource Manager instead.

How to Isolate Databases

In addition to the reasons why to isolate databases, there is also the question of how the Database per Service Design Pattern and sharded databases are deployed at the physical infrastructure level. With Oracle, we have 4 options for how to isolate databases as follows:

  • Dedicated Physical Servers per Database
  • Dedicated Virtual Machine per Database
  • Multiple Databases per Physical/Virtual Machine
  • Multiple Pluggable Databases Per Container
  • Oracle Resource Manager (DBRM & IORM)

We need to make sure developers have the autonomy they’re looking for, but then balance how they are deployed in the infrastructure. Managers of the infrastructure must determine how and when to isolate vs. consolidate databases. Excessive isolation such as the “one database per virtual machine” approach results in high cost of operation, so we recommend a more balanced approach.

Don’t Over-Isolate: Limit Virtual Sprawl

Data Centers in the past often suffered from sprawl (physical sprawl), where each application and it’s database(s) ran on their own physical servers. Virtual Machine technology has allowed IT organizations to stem the tide of physical sprawl, but this has often resulted in virtual sprawl. Increasing compute and storage density has also allowed more workload to be laid on top of the same physical footprint, using virtualization to maintain the same (or even greater) isolation between workloads.

You will quickly build an administrative nightmare if each microservice uses a dedicated database, and if each database uses a dedicated Virtual Machine. You can easily meet the needs of your application developers by taking a more judicious approach to deploying databases by using the full range of options at your disposal, using the right tool for the task such as using Oracle Resource Manager to manage resources.

Stop Noisy Neighbors: Use Resource Manager

There is no reason to isolate Oracle Databases simply for the purposes of resource management. We can easily ensure each database receives the appropriate amount of resources using Oracle Database Resource Manager (DBRM).  There are 4 primary resources that need to be managed in any system:

  • CPU
  • Memory
  • Processes
  • I/O

CPU is managed using Oracle Database Resource Manager (DBRM), using shares & limits inside of a Container Database, Instance Caging (CPU_COUNT) across containers and non-pluggable databases, and within databases using Consumer Groups.  DBRM gives us control of ALL types of databases and workloads, and it’s integrated with IORM (see below for detail).

Memory usage for Oracle databases is managed by controlling SGA (System Global Area) and PGA (Process Global Area) settings.

Processes within Oracle databases are managed by controls for sessions and parallel query servers.

I/O resources are much simpler to manage on Exadata because we have IORM (I/O Resource Manager). Exadata is integrated hardware and software, spanning both compute and storage. Using IORM Objective “auto” allows IORM to inherit the same resource ratios set for CPU in DBRM, so you have one set of controls that handles everything.

See our demo series on Resource Management (starting here) that shows how you can manage resources between pluggable databases, between container databases and non-pluggable databases, and even within databases using Consumer Groups.

Use Resource Shapes for Simplicity

Of course we recommend establishing a standardized set of “resource shapes” to choose from rather than making each database a unique and special snowflake. It’s much easier to manage a large population of databases if they all follow some standards, including the resources assigned to them. Using Resource Shapes means you establish shapes with the same ratios for the following resources:

  • CPU
  • Memory (SGA & PGA)
  • Processes (Sessions & PQ Servers)
  • I/O

I covered this in detail in the MAA Best Practices white paper on Database Consolidation available here. The white paper includes different allocations for DW vs. OLTP databases, so you’ll see 2 different tables of Resource Shapes. DW systems need larger PGA memory and more processes are devoted to parallel processing than on OLTP systems.  You will see those differences in the resource shape tables.

Reducing Blast Area Still Means a BLAST!

Reducing the “blast area” by dividing the data by each microservice and sharding large databases doesn’t ELIMINATE the blast (meaning “failure”). What a smaller blast area gives you is pieces that are smaller and easier to fix (because they are smaller). It’s also important to realize that a piece of an application dead in the water is still dead in the water.  If that piece of the application (a microservice) is critical, the entire application might also be dead in the water.

Sharding is another design that reduces the “blast area” of a failure, but an outage of a shard is still an outage.  It’s just easier to fix because it’s smaller, but it still needs to be fixed! In the best case scenario, you have SOME of your users dead in the water because the data they need (and the microservice) isn’t available.  In the worst case scenario, you will have some microservices or portions of your application that need ALL of the shards available for it to function, and those portions of the application are dead in the water.

In short, reducing the blast area doesn’t necessarily increase availability, but might help in reducing the duration of an outage.

HA is still Important with Microservices & Sharding

High Availability (HA) systems are still important, even for microservices and sharded databases. There can still be a “blast” that takes down a database, which takes down a microservice or shard, which will impact at least some of your users if not all of them depending on how your application uses that data.

Just because you have things broken down into small pieces doesn’t mean there’s no impact of an outage on any piece. It’s still important to provide HA, which we do with these tools:

  • Core HA Features of Oracle Database
  • Oracle Real Application Clusters
  • Oracle Active Data Guard
  • Oracle Application Continuity

Core HA features of Oracle Database have become something that we almost don’t even consider these days, but are still huge differentiators compared to other databases on the market. Oracle experts often take for granted things like online index rebuilds, online table move, and other features that have been developed over the years. I am often surprised to find other databases haven’t caught up, and some of the newer database engines are still years behind.

Oracle Real Application Clusters (RAC) is still unique in the market after all of these years. We often think of RAC for scalability, but it’s a huge part of the Oracle Database availability solution as well.

Oracle Active Data Guard is the Oracle solution for providing Read Replicas  (or “reader farms”), but it’s also a critical part of the Oracle Database availability solution. Active Data Guard provides for availability when a “blast” (failure) occurs, but it’s also used to provide availability during proactive maintenance.

Oracle Application Continuity hasn’t gotten enough press over the past 5+ years, but it’s yet another completely unique capability that Oracle offers. People seem to remember TAF (Transparent Application Failover), but Application Continuity is a completely different animal.  TAF required application code changes, while AC has essentially pushed much of that complexity into the SQL driver for Oracle.

Oracle Database has long exceeded the capabilities of applications, and Application Continuity closes the gap. Applications need to tolerate rolling outages of a RAC Cluster, and Application Continuity is what makes this happen. Application Continuity is implemented through CONFIGURATION changes, not within application code.

Data Driven Apps: Benefits of Converged Database

The Oracle Converged Database brings a number of critical advantages for data driven applications compared to point-solution databases.  See Juan Loaiza’s talk on Data Driven Apps at OOW London 2019 for a great talk on this topic. Although point-solution databases are sometimes referred to as “best of breed”, many of them don’t live up to that claim. You would suppose a database with a narrow focus might do that job better, but that’s not necessarily the case. The Oracle Converged Database advantages include:

  • Data Modeling Flexibility
  • Simplified Data Movement
  • Portable Developer Skills
  • Increased Developer Productivity
  • Any Workload, Any Data
  • Simplified Consolidation & Isolation
  • Any Size, Any Scale

Data Modeling Flexibility comes from having a single database that provides the full range of data modeling needs. While developers might make the perfect decision and choose the exact point-solution database needed, requirements often change. A development team might choose a relational database, only to find that some data really needs a different modeling approach such as JSON Documents.

While Oracle Databases can be deployed using a SINGLE modeling technique (Relational, Document, Star Schema, Property Graph, etc.) you can also use MULTIPLE modeling approaches within the same Oracle Database. I have worked with data that didn’t fit easily into the relational model, so it’s great to be able to use JSON Documents or another technique that fits better for certain portions of your data model. It would be a huge pain (show-stopper really) if you had to switch databases or use another database for a small component like a single microservice.

Simplified Data Movement comes from using the same Oracle Database on both the sending and receiving side. That data movement is made vastly easier if the data resides with a single shared database, but is easier between Oracle Databases because both databases use the same tools, same drivers, same datatypes, etc. Data movement can be completely eliminated in cases where applications share the same data, such as combined OTLP and Reporting in a single database (yes, this is possible and quite common with Oracle). Data Movement can also be quite costly in Cloud environments, which is another benefit of using Oracle Converged Database rather than multiple point-solution databases.

Portable Developer Skills are a key benefit of the Oracle Converged Database, allowing developers to work on ANY microservice or application that uses the same database without extensive re-training. Developers who write analytic or Data Mining code can easily move into a development team working on transactional applications or vice-versa.

Increased Developer Productivity comes from having a common set of interfaces for all databases, regardless of what features are used within each database. Any feature provided by the database represents code that DOES NOT have to be written by a developer.

Any Workload, Any Data includes traditional OLTP and Analytic applications, but also includes Machine Learning, block chain, property graphs, Time Series analysis, Spatial, Internet of Things, and Event Processing. Oracle Database includes robust capabilities in these areas that have been proven over many deployments at customer sites worldwide.

Simplified Consolidation & Isolation comes by having databases that all run the same Converged Database. Administrators have the flexibility to isolate databases where needed, or consolidate them to ease the administrative burden as well as to improve resource efficiency. Oracle Databases can even be consolidated into a Container Database, for much greater density and lower cost of operation. Yes, this also applies to Cloud environments, where costs can easily escalate. The bottom line is you simply can’t do this with multiple divergent database engines such as one for Relational OLTP, one for Relational Star Schema DW, one for JSON Documents, etc.

Any Size, Any Scale database or workload can be handled by Oracle Database. It’s amazing to me that other databases haven’t caught up to Oracle after all these years. We still encounter customers who are faced with a database migration (to Oracle) because their chosen database engine simply can’t keep up. Oracle Database is able to easily support databases at least 10X larger than our competitors can handle, and the Exadata platform simply takes that number higher.

Microservices and Sharding increases the number of databases, while also reducing the maximum size of databases, but it’s great to know your can scale a database without worry if it’s running Oracle Database. You have time to adjust if you need to re-shard your databases or if your microservices databases grow unexpectedly. Having size limits that are 10X or even 100X higher than other databases gives you flexibility.

Microservices and Sharding DO NOT reduce the overall amount of data! In fact, the volume of data can increase somewhat if you’re duplicating any data across microservices or shards. In any case, the volume of data certainly isn’t less.

Summary

Oracle Converged Database gives you flexibility to deploy databases as needed, using microservices and sharding. Oracle is the #1 most widely used database overall, but also the #1 most widely used for microservices and sharding. Oracle Databases can be easily consolidated or isolated as needed to meet the needs of modern Data Driven Applications.

 

NoSQL Improved Upon Record Based Databases

NoSQL databases (known as key/value, key/JSON, or “document” database) provide data that is self-describing, which is a huge improvement compared to older “record-based” databases. NoSQL databases are often compared to (purely) relational databases, but this is primarily because relational databases hold the biggest market. Record-based databases are really a non-entity these days, so there wouldn’t be much benefit in trying to “compete” against them.

The best comparison of NoSQL is against record-based databases.

Record Based Databases

Old fashioned “record-based” databases contained the data, while the structure of data was contained within the application. The database would contain a set of data values such as the following:

123, Chris, Craft, 123-555-1212

The application would know that these data attributes were as follows:

Record Key, First Name, Last Name, Phone Number

The application would typically contain a structure definition for the language the application was written in. If you go back far enough in history, this would have been Assembler or possibly COBOL language. The C/C++ language(s) would have used a struct contained in a header file.

This was fine in the days when ALL of the application code that touched the data was written in the same language. It was a huge pain when multiple applications accessed a common record-based database using different languages, such as a COBOL application that created data, and a C/C++ or Java application needed to access the same data. Someone needed to maintain structure definitions that were in-sync with each other, and it was a huge problem when changes were made to the structure.

Record-based databases included things like VSAM and ISAM. Neither of those databases used SQL, so they would certainly fit the definition of a NoSQL database. I realize the NoSQL people in this world aren’t thinking of VSAM and ISAM to be part of their orbit. Of course that’s a basic problem with the “NoSQL” term itself. Anything defined by what it’s not is very poorly defined.

Self-Describing Data

NoSQL databases typically contain self-describing data. JSON (Java Script Object Notation) is the most commonly used form of self-describing data, but there are other forms as well including XML. Rather than describing data in a struct within the application code, those structure definitions are kept within the data. The database provides a single source of truth for data structure, but also allows for much simpler modification of the data structure and code that access the data as the application and database evolve over time.

A JSON structure matching the previous example might look like this:

{
    "recordkey":"123",
    "firstname":"Chris",
    "lastname":"Craft",
    "phonenumber":"123-555-1212"
}

Of course every single record contains the field names, which consume more storage than the data in this example. Storage is cheap these days and we have much greater processing power than years ago, so there isn’t as much “data pressure” that would force us to define the structure inside the application. The reduction of “data pressure” allows us to use a method that’s less storage efficient to gain advantages in the application development side.

Data Pressure Considerations

The term “data pressure” essentially refers to the cost of storage and the cost of processing larger and larger volumes of data.  Record-based database were extremely efficient in terms of the space they required and compute power to process that data. Less data volume meant less data store and less data to process, so nobody would have considered making data self-describing.

Record-based databases were never self-describing because that simply was not practical decades ago. The cost of storage was simply too high and putting data descriptions into the data itself would have resulted in a huge amount of data duplication (duplication of the data definitions stored within each record), and there would not have been enough compute power on earth to work with that data.

Comparing NoSQL to Relational

NoSQL databases are more about the structure and representation of data than about the  absence of SQL language. Part of this is the same issue of object/relational impedance mismatch that has been around for decades. The rise of object-oriented programming resulted in application code that simply didn’t fit well with data modeled relationally.

Oracle Database supports Key/JSON

The SQL language does not work only with relational data. We can certainly use SQL to access a 2 column table containing a KEY (column #1) and JSON structure (column #2). Oracle Database includes the ability to model data in key/JSON format. Oracle databases might contain nothing but key/JSON data, some relational and some key/JSON, or JSON can be used to extend a relational model.

Oracle is a Converged Database

Oracle Database is a called a “converged” database because it supports virtually any data modeling approach and supports virtually any workload, all within a single database or deployed as separate databases.  Oracle is a multi-model database that supports relational, graph, time series, key/JSON, row-oriented, column-oriented (on Exadata), Geospatial data, normalized models, dimensional models (star schema, snowflake schema, etc.) and others.  Oracle Database supports all workload types including OLTP, DW, and IoT within a single database.

Oracle database uses the SQL language to process what some consider to be “NoSQL” format data (like JSON).

All Data Has Relationships

All data has relationships to other data. You can easily see this by simply looking at any piece of data.  All data has relationships to other data such as:

  • Customers place Orders
  • Orders are for Products
  • Products are in Warehouses
  • Warehouses have Employees

The list of relationships goes on and on, ad infinitum. Ensuring relationships are tracked is critical to data integrity. What happens if a customer orders a product that doesn’t exist? What happens if products go missing from a warehouse?

Databases Simplify Applications

The raison d’être for databases has always been to simplify application code. Anything done by the database is something the application code doesn’t have to handle. The basic functions of a database through the history of IT has been these:

  • Preventing Data Loss
  • Controlling multiple user access to data
  • Enforcing data values are correct
  • Ensuring data relationships are valid
  • Data Security

These functions have been done by databases for decades because it simplifies application code.

Preventing Data Loss

Databases protect data, which is the lifeblood of an organization. Preventing loss of data has been key to survival even in the days of paper records and stone tablets. Electronic records are subjected to loss from a wide variety of failures, and databases are responsible for providing protection against these failures. Data loss might mean loss of the ENTIRE database, but also means changes from one user getting destroyed by another user.

Controlling Multiple User Access to Data

Without databases, applications would face extreme complexity in controlling access to data by multiple users. Access to data includes reading data, but also includes changing that data. Applications would be horribly complex without the concurrency controls provided by databases. Users also shouldn’t have access to changes made by others until those changes are complete.

Enforcing Data Values are Correct

The correctness of data values includes enforcing number attributes contain numeric values, date attributes contain valid dates, JSON strings contain valid JSON, GPS locations are valid locations, countries are valid countries, etc. Without this enforcement by the database, errors in application code will introduce corrupt data, which will have a cascading impact on other application code. If the database contains an order date of February 30th, that’s clearly wrong and it’s difficult to determine how that value got inserted into the database. It’s much better to catch that when it happens and prevent the change.

Ensuring Data Relationships are Valid

Relationships represent a special class of data values that are enforced by the database. This goes to the heart of why databases were created, and why the relational database became the leader. Hierarchical databases could only enforce certain data relationships. Network databases were a little better, but also had limitations. Relational databases (invented by Edgar F. Codd) provided much greater capability to model data relationships and ensure those relationships were reflected in the data.

Data Security

Of course databases play a central role in the security of data, including who can view data as well as who can make changes to data. This includes not simply access to the entire database, but down to individual attributes as well. There are different access controls over who can view customers, who can change customer data, and who can view or change customer addresses and credit cards.

All of this complexity can be pushed into the application code, but we often want separate data access controls enforced by the database. The principal is known as “defense in depth”. Fort Knox has the gold inside a vault, inside a building, surrounded by a fence, patrolled by guards. You certainly wouldn’t want a Fort Knox with the gold sitting in the middle of a field with guards running around trying to protect it. Defense. In. Depth.

Normalization & Denormalization

Data often begins in a denormalized form or at least we often start to think about data that’s not normalized, so we typically “normalize” the data when designing a relational data model. We then “de-normalize” data primarily for performance reasons. One example might be phone numbers. Customers often have multiple phone numbers such as home, work, and cellular phones.

In a normalized data model, each of a customers’ phone numbers would be stored in separate rows of a phone number table related to the customer. However, that means accessing extra rows of data when you want to show them all. To improve performance of accessing ALL of the rows, we might simply put separate home, work, and cellular phone numbers on the customer table. However, what happens when a customer has 2 cell phones?

Phone numbers are a great case for the combination of JSON within an otherwise relational database.  Rather than modeling N number of phone numbers in the customer table, we might create a PHONE_NUMBERS column as a JSON structure.  That design allows a variable number of phone numbers for each customer.

Database Optimization

I heard someone say recently that relational databases aren’t optimized for anything, which is a complete misrepresentation of reality. Every table in a normalized relational data model should have a primary key. That primary key will have an index. Access to a row of data through the index is “optimized” and will give good performance.

The database “engine” does not provide the logical structure of data. Database “engines” (historically) have not automatically generated indexes beyond those needed to enforce constraints such as primary keys and unique keys. Building of secondary indexes (those beyond what’s needed to enforce constraints) has always been the responsibility of the database administrator, data modeler, or application developer. So yes, in a sense the database is not “optimized” out of the box if you don’t optimize it!

Oracle Autonomous Database is the first on the market that is doing this sort of optimization and automatically handling what has traditionally been the responsibility of the DBA, data modeler, or application developer.

Cloud Databases: All-Inclusive vs. Purpose-Built?

Does the emergence of the Cloud mean that “purpose-built” databases make more sense, or is an All-Inclusive database still the best choice?

There are multiple purpose-built databases on the market that handle specific types of data, specific workloads, or have some specialized functionality.  Examples of purpose-built databases include:

  • Relational OLTP Databases
  • Relational Data Warehouse Databases
  • Key/Value Databases
  • Time Series Databases
  • In-Memory Databases
  • Graph Databases

Of course there are other examples, but these are the most common.  The majority of OLTP databases are relational, and relational data modeling is the norm for Data Warehouse databases as well.  Key/value databases have gotten more popular in OLTP applications that involve high frequency “store & retrieve” operations on relatively simple and often dynamic structures.  Key/Value databases (using JSON or XML structures) have the advantage of a non-relational data model that can be quite dynamic.  Time Series databases have emerged with a focus on forecasting.  In-Memory databases deliver high performance, and Property Graph databases are focused on relationship attributes.

Screen Shot 2019-06-11 at 1.32.38 PM

Easy Deployment

In the Cloud, organizations can easily deploy multiple divergent technologies, so does it make more sense to use purpose-built databases in the Cloud?  In an on-premise environment, deploying another database means buying and deploying yet another server.  Some Cloud vendors will also provide operational support or a “managed service” for each database, so why not?

Of course it’s not only a matter of deploying the database itself.  Each divergent database engine will require it’s own client-side drivers to establish connectivity.  Each of those clients will require a tie-in to any centralized Identity Management service as well.

Operational Burden

In years past (and with on-premise data center environments currently), OPERATING multiple purpose-built databases was always a big disadvantage.  Each database engine requires an IT organization to build expertise in that engine, deploy servers and storage to run those databases, as well as secure and maintain them all.  Moving to a Cloud environment, some of the operational burden still falls on the IT organization, even if the database is operated as a “managed service” provided by the Cloud vendor.

Security Burden

Each separate database technology represents yet another technology that needs to be secured.  Each database has a different security model, with it’s own unique capabilities and quirks.  Each database can be expected to have it’s own set of security flaws and bugs that need to be tracked and managed for proper compliance.  Even if the Cloud vendor is responsible for applying security patches, the IT organization needs to be aware of the flaws and making sure vulnerabilities are addressed.  Trust your Cloud vendor, but VERIFY they are taking care of security!

Building Developer Skills

Each divergent database technology requires specialized developer skills.  Developers need to build expertise in the data modeling capabilities, the functional capabilities, and the business theories that are involved.  In the case of analytics, this often includes expertise in statistical analysis such as auto-regressive modeling for time-series forecasting.

Islands of Data

Multiple divergent database technologies results in islands of data that often needs to be integrated.  Movement of data between databases takes time and induces lag.  If operational reports are executed only from a Data Warehouse, those reports will be incorrect until data from the OLTP system reaches the Data Warehouse.

Islands of data become a big problem if it’s sub-divided too much, becoming a series of  data islands.  Deploying multiple read-replicas of data also isn’t always a solution to every situation, since those replicas need to be kept updated.  Read Replicas are a great solution for data with effectivity dates, but a very poor solution for data with frequent updates.

Multiple islands of data ultimately induces delays in data updates across applications, modules, or even micro-services if each is using a separate data store.  The most obvious case is the time-delay induced by ETL processing between OLTP systems and Data Warehouse systems.  The time-delay is fine for long-term analytics, but unacceptable for operational reporting.

Islands of Functionality

Of course each purpose-built database includes a set of specific functionality.  If a key/value database is used for the CUSTOMER ORDERS database, it’s a problem if some relational modeling capabilities are needed in that database.

Key/Value databases have been presented as if they were something new, but are decades old. Key/value databases have always been FAST for ingesting data because they are simply storing a structure in a single “write” (or insert) operation.  Some of the earliest databases used a key/value modeling approach, which becomes extremely cumbersome with highly complex data relationships.  What’s new in key/value is the ability to model data structures using JSON or XML, including the ability to ENFORCE the JSON or XML structural integrity.

All-Inclusive Database Approach

The All-Inclusive database approach puts all of this functionality into a single database engine.  Developers can use ANY of the available functionality within ANY database instead of having to move the data to a place where the functionality resides.  Developers simply use the functionality in-place.

As the following diagram illustrates, Oracle includes ALL of this functionality in a single database engine.  Developers working on an OLTP database can use capabilities of Data Warehouse databases within the same database.  The data doesn’t have to be moved to take advantage of such features.  If data nee

Screen Shot 2019-06-11 at 2.13.39 PM

One Engine, Not One Database!

Notice that this means ONE ENGINE, not one database!  Just because you could put all of your data into a single database doesn’t mean you should.  In fact, you could use separate Oracle databases for each type of data & functionality as follows:

Screen Shot 2019-06-11 at 2.27.49 PM

Of course this approach doesn’t necessarily make sense either.  Nobody does data architecture like this.  Each database is created to handle specific applications or sub-components of an application.  Developers then use whatever functionality makes sense to handle the business needs of that application or sub-component.

Data Architecture

Data Architecture for an organization approaches the problem from the standpoint of business processes and data ownership.  Data architecture should not include limits on data modeling capabilities or limits on how the data can be processed.  Consider the following sample data architecture:

Screen Shot 2019-06-11 at 2.32.15 PM

Notice that Active Data Guard is used for most databases to provide availability, as well as for Read Replicas in the case of the Product Catalog.  Active Data Guard is even used for the ORDERS database, even though it’s a sharded database.  While sharding reduces the failure impact (or “blast area”) of the database, each shard still represents a single source of truth for the range data it contains.

If this data architecture included limitations in functionality, the entire architecture gets much more complex.  For example, the ORDERS database mainly uses key/value data modeling to increase agility into the data model, as well as improving response time.  Submitting an order involves a single WRITE (or INSERT) of a single record that contains a complex JSON structure.  Processing each order means a single READ (or SELECT) of a single row of data containing that JSON structure.  However, if we need a bit of relational functionality within this database, that’s certainly possible.  The main function of the database uses key/value, but developers aren’t limited to just that single data modeling capability.

All-Inclusive in the Cloud

The Oracle database has always been a rather LARGE chunk of code.  It can be hard to install and takes up lots of space.  However, once it’s up and running, Oracle has enormous depth in capability.  My original assumption was that Cloud might tend to favor purpose-built databases.  Of course you will hear that from vendors who lack an all-inclusive database engine of their own.  However, it turns out that Cloud still does favor the All-Inclusive approach.

Islands of data & functionality is still a disadvantage, even in the Cloud.

 

 

What Makes Oracle Database Number One?

With more than 30 years on the market, it’s tempting to look back at how Oracle became the number one database, but it’s really more useful to consider why it remains number one. Software engineers need to think hard about committing themselves and their code (the fruits of their labor) to a second-rate database engine.

Oracle is the #1 database in terms of market revenues, at right around DOUBLE the revenues of its nearest competitor. With the rise of open-source “free” software, some have asked whether revenue is still the best measure. Oracle has also dominated in popularity surveys and other attempts to measure market share aside from revenue comparisons. So, the question remains; what makes the Oracle Database number one? What follows is a summary and short discussion of each of these topics:

In short, Oracle is the only All-Inclusive database on the market that brings all of these capabilities in a single package, which allows developers to concentrate on building application functionality rather than working around limitations found in other databases.  We will look in detail of each topic below.

The All Inclusive Database

Oracle is the only all-inclusive database on the market. Oracle started as a relational database, and relational is still an extremely critical part of what it does. However, Oracle also provides the ability to model and PROCESS data using many different approaches, all within the same database.

One of the challenges for software engineers is to use Oracle’s capabilities correctly. Just because you CAN grow a single Oracle database forever doesn’t mean you should. In fact, you should provide some separation between applications at some point. Different applications often work better when the same data is structured differently. For example, OLTP applications typically work best with data modeled in 3rd Normal Form (3NF). Data Warehouse applications work best with Star Schema or Dimensional models. While you could put all of this data into a single Oracle database or even in a single schema within a single Oracle database, it often makes sense to put them into different databases.

If you are going to break apart your applications into separate databases, then why not use so-called “Best of Breed” or “single-purpose” databases instead? First, Software Engineers need to ask whether those single-purpose databases are somehow better than the all-inclusive Oracle database. Just because a database does 1 thing doesn’t make it better. All databases share many layers of common internal functions like caching, storage, transaction handling, connection management, API, security, and client libraries. Oracle has made and continues to make the investment necessary to leverage these common functions to build the multi-purpose, all-inclusive database.

We have seen single-purpose databases come and go over the years, including Object Databases, Columnar Databases, Key/Value Databases, In-Memory Databases, and Document Databases.  Oracle has always taken the approach of incorporating those concepts as features into a single all-inclusive database.  The all-inclusive approach has proven to offer much more powerful capabilities without the complexity of multiple single-purpose databases.  Inclusivity has multiple dimensions including scale, workload, type of data, data modeling techniques and other dimensions as we shall see below.

Database Fundamentals

There are a number of fundamentals that need to be addressed before we get deep into the all-inclusive nature of the Oracle database.  These fundamentals need to be clear because not all database technology on the market has these capabilities.

Structured Query Language (SQL) underlies the Oracle database, and is the fundamental programming interface to access the database.  Oracle supports ANSI standard SQL, and has extensions to the SQL language for features that aren’t included in ANSI standard.

Transactions are a fundamental capability of any database, dating back to the earliest days of computer science.  To be clear, supporting transactions means the database ensures the A.C.I.D. properties of Atomicity, Consistency, Independence, and Durability.  Without A.C.I.D. transaction support, a database simply isn’t performing the proper role of a database.  ACID transactions also need to span the ENTIRE scope of a transaction including everything the transaction modifies, not just a single table or partition of a table.  Transactions also need to extend to all related structures including indexes, and of course Oracle meets these needs.

Availability is a critical attribute of any database.  Unlike application servers, the fundamental purpose of a database is to manage data.  Various techniques exist to distribute and replicate data, but availability is still critically important.  The Oracle database on Exadata is one of the few technologies in the market that has ever achieve AL4 fault tolerance capabilities using Oracle’s Maximum Availability Architecture (MAA).

Programmable database features represent another foundational capability that databases need to provide.  The Oracle database has served as the model that other vendors have followed in the area of Stored Procedures and Triggers.  Some vendors have even patterned their stored procedure language after Oracle PL/SQL.  These capabilities are critical to Software Engineers who need to ensure data integrity in their applications by extending the core capabilities of the database and the relational model.  Stored procedures also play a critical role in support of other functionality such as JSON and XML support.

Backup/Recovery capabilities of the database allow users to protect against hardware failures and logical (application) failures that impact data integrity.  Oracle’s integrated RMAN (Recovery Manager) tool provides an enormous range of database backup/recovery capabilities, including the ability to recover a database to any desired point-in-time (NOT a pre-determined point-in-time) during the past N number of days, weeks, or months.  Oracle’s Flashback Database provides an “time machine” inside the database to provide additional capabilities.  Oracle’s Zero Data Loss Recovery Appliance builds upon RMAN to provide an integrated solution that reduces backup time by as much as 95%.

Disaster Recovery is built into the Oracle database in Oracle Data Guard, which provides the ability to synchronize the database with one or more copies of that same database, either at the local site or at a disaster recovery site.  Oracle’s Active Data Guard also allows the DR site database to be queryable.

All Inclusive – Scale

What is really stunning is the SCALE at which Oracle can operate, especially on the Exadata platform. Many of the largest Oracle databases in the world have moved to Exadata. Software Engineers will notice that Oracle people aren’t impressed by the size of your database unless it’s a few hundred terabytes. I know several customers on Exadata who have databases in the range of 750 terabytes and larger (!) and those are even Hybrid Columnar Compressed! Those databases would be completely unmanageable on a conventional server+storage architecture. These large scale data volumes are routine with Oracle, especially on Exadata.

With extremely large databases, you also get large numbers of users. It’s surprising to think that other databases have never been able to reach the same scale as Oracle in terms of users, connections, or transactions per second. There are several layers of capability that enable this sort of scale, and (again) the vast majority of large scale databases run on Oracle Exadata.

Of course SMALL Oracle databases certainly are not a challenge for Oracle.  The Oracle database can easily store small amounts of data, but management of large numbers of small databases can pose significant administrative burden on a DBA team.  Fortunately, the Oracle Multitenant option allows large numbers of small databases to be consolidated under a single Container Database to reduce the system overhead.  Multitenant reduces use of CPU, Memory, and I/O compared to those same databases running individually rather than combined into a Container Database.  Consolidation density is greatly improved through the use of Oracle Multitenant, and benchmarks have shown consolidation density improvements of 10X or greater.  The largest gains of more than 20X come when comparing Multitenant to Virtual Machine consolidation using one database per VM.

Non-escalating row level locking in the Oracle database is a huge enabler of scale. As a software engineer, you need to consider what this means for your code. Transactions insert, update, and delete rows of data. If two transactions attempt to modify the same data, one of them will wait. Some databases (most of them aside from Oracle) will place locks of entire BLOCKS of data (will escalate the lock to cover more rows) if your code locks too many rows. Those databases might even escalate locks. to cover entire TABLES because they can’t manage large numbers of locks. The locking model is critical, so software engineers need to be concerned.  Oracle’s non-escalating row-level locks are a critical enabler of scale.

Readers don’t block writers, and writers don’ block readers in Oracle, which also enables scale.  Most databases aside from Oracle still suffer from the problem of transactions blocking each other.  Approaches such as “dirty read” are a horrible solution because it means you’re reading BAD DATA.

In short, I am certainly NOT advising software engineers to scale your databases to the moon. In fact, you should certainly break them up if needed. However, you don’t need to be so concerned with database size if you’re using Oracle, especially on the Exadata platform. You certainly should use multiple schemas within a database. You certainly should combine multiple smaller databases into one Container Database using Oracle Multitenant. Just know that the high-end size can go much higher with Oracle than with other databases, and you won’t be stuck re-architecting applications if your data grows.

All Inclusive – Workloads

The Oracle database is inclusive on multiple dimensions, and one of these dimensions is workload type. Workloads generally break down into OLTP versus Data Warehouse, but databases might also contain a mixture of workloads. An OLTP database predominantly services OLTP transactions, and a Data Warehouse predominantly services parallel queries, but most are really a mixture.

Data Warehouse deployments should always include the Partitioning Option, which allows large tables to be broken up into multiple sections known as partitions.  Partitions can be used to simplify data maintenance such as quickly loading data through a Partition Exchange operation, or by dropping partitions to purge data.  Partitions are also used while querying because they serve like a “coarse index” into the data.  SQL queries automatically benefit from performance gained by skipping partitions that don’t contain the needed data.  For example, when a table contains data for a YEAR but is partitioned by MONTH, a user might query data for one month, and the Oracle query optimizer automatically skips the other 11 months.

Partitioning is primarily used for Data Warehousing, but is also useful for OLTP applications that store large amounts of data online.  Data can be partitioned by age, such as partitioned by month or year.  Older partitions can be easily dropped to purge data, and backups can be configured to SKIP those older partitions to make backups faster.

OLTP workloads are transactional systems with larger numbers of users including Order Entry systems, Human Resources, Manufacturing, and web sites.  The Oracle database is nearly ubiquitous in the OLTP workload market.

Mixed Workloads include both OLTP transactions as well as some amount of operational reporting or ad-hoc lookups.  While the Parallel Query capability of the Oracle database is primarily used in Data Warehousing, it can also be used to speed-up operational reports and ad-hoc queries.  This type of workload in an OLTP system is the reason why we consider most OLTP databases to be “mixed workload” databases rather than pure OLTP.

All Inclusive – Data Modeling

Oracle was originally known as a Relational Database Management System (R-DBMS) but these days it is considered a Multi-Model database due to it’s ability to work with all major data modeling techniques including:

Software engineers are not stuck with one modeling technique in the Oracle database. A single database can contain data modeled using ALL of these techniques. All of them can be used within a single schema, or schemas can be used to separate the data. Software engineers can split data into different databases, and administrators can combine multiple databases into a single Container Database to ease the administrative burden.

Relational remains the predominant method for data modeling due to the ability to ensure data integrity and avoid repetition of data elements in the structure.  Relational is not suited to EVERY type of application, but it still remains the clear leader in the database world.

Object databases arose a number of years ago to provide a better match with object oriented languages such as C++ and Java.  The object/relational capabilities of Oracle have been much more useful, allowing the application to reference data as “objects”, while the database stores the data in relational format to take advantage of the data integrity capabilities that relational provides.

Queue data includes FIFO (First In First Out), LIFO (Last In First Out) and other structures optimized for handling data in a specific order.  Many applications involve some processing of queues, although this tends to be a relatively small component of most applications.  Software Engineers can simply create those queues within the Oracle database and have them be persisted to storage, and transactionally consistent with the rest of the database.  This approach is much simpler than deploying a dedicated single-purpose database to handle queue data.

Key/Value and Document databases are essentially a simple TABLE structure with 2 columns, which are a key column and value column.  The VALUE in a key/value pair can contain a complex XML or JSON structure, but otherwise it’s simply a VALUE column.  The Oracle database provides a set of functions for storing and updating data within the XML or JSON structure, and those functions ensure the integrity of the XML/JSON structure.

Graph Databases are designed with an emphasis on understanding relationships among data elements rather than just the data elements themselves.  Oracle Spatial and Graph can be used to better understand the relationships between entities such as social networks, geographical spatial relationships, etc.

Multidimensional data is used in OLAP applications for deep analysis of data.  The most prominent single-purpose Multidimensional database is Oracle Hyperion, which is used in the suite of Hyperion business applications.  That same capability to store and process data in multidimensional (CUBE) format is also available in the Oracle Database Enterprise Edition.

All Inclusive – Types of Data

I’m saying “types of data” rather than data types for a reason. The term “data type” typically refers to typecasting of columns, but this topic is broader.  The types of data that can be stored in an Oracle database go beyond simple datatypes to include a broad range of data as follows:

  • Numeric
  • Character and variable character
  • Dates, times and time stamps
  • Spatial Data
  • Time Series Data
  • Images
  • Large Objects (LOBS, BLOBS, CLOBS)
  • XML
  • JSON

A single Oracle database can contain all of these types of data within a single schema or across multiple schemas within a single database.  Of course it’s possible to devote a single Oracle database to only a specific type of data as well.  Multiple smaller databases can always be combined together under a single Container Database using the Oracle Multitenant Option.

Time Series data is often used along with analytics of that data.  The Oracle database includes Time Series data analysis functions built into Oracle Data Mining.  Time Series is a special form of regression known as auto-regressive modeling.  Some single-purpose databases on the market are designed to store time series data, but Oracle Data Mining includes built-in functions for performing auto-regressive modeling on that data.

XML and JSON are two types of data that deserve special mention.  Both of these are markup languages that serve as complex structures containing all manner of data inside them.  The Oracle database includes functions that enable applications to store and to modify data contained within XML and JSON structures, while ensuring the integrity of those structures.

All Inclusive – Sharding

Oracle Database includes sharding, including sharding of all types of data rather than just key/value or document data. Sharding allows multiple databases to remain separate, but behave as one database.  Sharding is useful for addressing “data sovereignty” regulations, where data must reside in a given geography (country), but also needs to be used globally.  There are other uses of sharding beyond data sovereignty, including the ability to sub-divide data across multiple servers to minimize the “blast radius” of failures.

Such sub-divided or sharded data can minimize the impact of failures, but still results in a single source of truth database that needs protection.  Even sharded databases still often need the protection of Oracle’s Maximum Availability Architecture to ensure accessibility of the data in sharded databases.

You certainly don’t need a specialized database simply to meet sharding requirements such as data sovereignty.  Those needs can be easily met using the Oracle database.

All Inclusive – Analytics

The Analytic capabilities began with the acquisition of Thinking Machines, which is now known as Oracle Data Mining and is now combined with Oracle R to provide the broadest Analytic capabilities in the industry.  The Oracle SQL Dialect is powerful by itself, and includes a number of analytic functions.  Oracle Data Mining extends the SQL language by providing complex mathematical and statistical analysis capabilities.

There aren’t many single-purpose “analytic” databases on the market.  Analytics is primarily the domain of specialized analytic languages such as the R language.  Oracle Data Mining simplifies the engineering of analytic applications through built-in functions that operate INSIDE the Oracle database, and run more efficiently and therefore FASTER inside the database.  Oracle also provides support for the R language INSIDE of the database, which also improves performance of R code.

From my experience, analytics is the domain of experts such as Data Scientists and Statisticians.  The tools inside of the Oracle database can be brought to bear in engineering solutions that provide high performance at high scale with very complex analytic processing.

All Inclusive – Cloud

The Oracle Cloud is the ultimate in inclusivity. The Oracle Cloud provides a License Included (LI) capability, as well as Bring Your Own License (BYOL). The advantage of License Included is that Software Engineers don’t need to be concerned with extra-cost database options, since all Oracle database software options are included in the Cloud.

All Inclusive – Autonomous

The Oracle Autonomous Database is Oracle Enterprise Edition with all database options, running on Exadata in the Oracle Cloud, with automation that handles all database operations. Software Engineers simply build and deploy applications on the Autonomous Database without being concerned about operational tasks such as monitoring, backup, and system tuning.  A number of the database options are implemented automatically in the Autonomous Database.

All Inclusive – Columnar Data

Relational databases traditionally store data internally in a row-oriented format. The Oracle database on Exadata and Autonomous Database also includes the ability to store data in columnar orientation using Hybrid Columnar Compression.  There are a number of purpose-built columnar databases on the market, but this capability has simply become a “feature” of the Oracle database on Exadata.  Customers don’t have to deploy a specialized database simply to have data compressed into a columnar format.  Tables in Oracle simply use the COMPRESS FOR QUERY option, and data will be stored in columnar format when it’s loaded into the table.

All Inclusive – External Tables & Big Data SQL

Data stored in files outside the database can be accessed using standard SQL commands using the Oracle SQL dialect by using External Tables.  Data stored in Hadoop can be accessed from within the Oracle database using Oracle Big Data SQL.  Providing access to data outside the database while still using the Oracle SQL dialect opens up new possibilities that weren’t available previously.  Big Data is a great technology, but lacks the rich set of tools that are based on the SQL language.  Oracle’s External Tables and Big Data SQL provide users of SQL-based tools access to a wider set of data than was possible previously.

All Inclusive – In-Memory

While Oracle has always used a database buffer cache in memory, and it has long been possible to pin tables in memory, The In-Memory option allows tables to be stored and processed in pure columnar format. The In-Memory Option (included in the Oracle Cloud) processes data in memory using SIMD Vector instructions, making SQL much faster, so being stored in memory is only part of the story. Software engineers simply write SQL and the SQL runs faster. Simply use the ALTER TABLE INMEMORY command to place tables in memory. Notice that INMEMORY is really “just” a tuning exercise, so it’s done automatically when needed in Autonomous Database.

All Inclusive – Developer Tools

As the #1 database, Oracle has the widest array of developer tools of any database on the market.  Oracle’s Developer Portal is a great place to start learning about the range of possibilities.  Virtually every programming language includes native connection capabilities for Oracle database including:

  • Node.js
  • Java
  • Python
  • .NET
  • SQL
  • R
  • PHP
  • PL/SQL
  • C/C++

This is just a short listing of the most prominent languages that can be used to interface with Oracle databases.  In addition, some other notable areas for exploration are as follows:

Machine Learning is used in Oracle Autonomous Database to automate the day-to-day operation of the database including proactive database health checks and reactive problem detection and resolution.  Oracle also offers Cloud tools for AI/ML to build your own applications using these and 3rd party tools with Oracle databases.

Low Code options from Oracle include Oracle Application Express (APEX), which allows developers to quickly build database oriented web-based applications.  There is a large and robust community of developers using APEX, and extensive resources available.

Security

Database security has always been a primary concern when choosing a database, and Oracle has consistently been a leader in database security.  Oracle database security is a large topic in itself, and includes the following:

  • Core Database Security (users, roles, authentication, etc.)
  • Encryption of Network Communication
  • Transparent Data Encryption (TDE)
  • Column-level Encryption
  • Oracle Key Vault
  • Database Auditing
  • Oracle Audit Vault
  • Oracle Database Firewall
  • Oracle Database Vault
  • Label Security
  • Real Application Security
  • Virtual Private Database (VPD)
  • Data Masking and Subsetting

For more information on Oracle database security, please refer to the Oracle Database Security Guide (here) as well as Oracle’s database security page (here).

Conclusion

Oracle is the #1 database because it’s the only All Inclusive Database available.  Other vendors and Cloud providers position single-purpose databases as if they were “best of breed” because they simply can’t match the capabilities of Oracle.  Software Engineers who choose single-purpose databases are often frustrated by the lack of capability.  Data in an OLTP system needs to be copied to a DW database to simply run operational reports.  Application functions that need the speed of in-memory processing must WAIT until the data can be copied from an OLTP or DW system into an In-Memory database.  Data in a single-purpose In-Memory database is constrained by the SIZE of the memory available on a computer server.

It’s simply much more powerful and simpler to use the all-inclusive Oracle Database.

Simple Data Warehouse Indexing Strategy for Exadata

This blog post outlines a simple strategy for indexing a Data Warehouse in an Exadata environment.

Does it make sense to DROP all of your indexes when moving a Data Warehouse to Exadata?  You will generally need fewer indexes when moving to an Exadata environment, but it really doesn’t make sense to drop ALL of your indexes.  Just follow these simple steps for easy indexing…

  1. Define and Enforce Primary Key Constraints
  2. Define and Enforce Unique Key Constraints
  3. Create Foreign Key Covering Indexes
  4. Generate and Maintain Optimizer Statistics
  5. Use the SQL Plan Access Advisor for Anything Else

Following these simple rules will result in a database schema that’s easy to maintain and delivers a good balance of performance and resource consumption.  In addition to these topics, I will also cover some additional points including Storage Indexes and use of Partitions as well as indexes on partitions.

Define and Enforce Primary Key Constraints

Most relational tables include a primary key that identifies each row in the table.  Primary keys in Oracle database (including on Exadata, and in Oracle Autonomous Database) are enforced using indexes, and the index is created automatically when the constraint is enabled.

Some database engines on the market allow you to create constraints, but don’t provide the ability to ENFORCE those constraints.  Of course that might be nice for documentary purposes, but it’s otherwise useless.

You can define the primary key as part of table creation as follows…

create table DEPT
(deptno    number not null constraint dept_pk primary key,
deptname   varchar2(255)
);

Of course, you can ALTER the table to add they primary key constraint later…

create table DEPT
(deptno number not null,
deptname varchar2(255)
);
alter table dept
add constraint dept_pk primary key (deptno);

Primary keys are referenced by foreign keys, which makes them different from their related cousin, the Unique Key.

Define and Enforce Unique Key Constraints

Unique keys are not the primary identifier of a row, but are otherwise unique within the table.  For example, we might want to include the Social Security Number for the employee, and we know that Social Security Numbers are supposed to be unique.

create table EMP
(
empno       number not null constraint emp_pk primary key,
first_name  varchar2(255),
last_name   varchar2(255),
salary      number,
ssn         number not null,
deptno      number not null,
constraint emp_ssn_uk unique (ssn)
);

Of course this can also be done using an ALTER command if you prefer.  The Oracle Database enforces Unique Constraints using an index, and that index also becomes available for use by the query optimizer.

Foreign Key Covering Indexes

Foreign keys are the fundamental mechanism of Referential Integrity.  Each foreign key references the primary key of another table in a relational database.  The fundamental purpose of a database is to ensure data integrity, and referential integrity is the foundation of relational databases.

There are some databases on the market that don’t have the ability to ENFORCE referential integrity, so you need to somehow build that logic into your application.  The Oracle Database DOES enforce referential integrity, including on Exadata and in the Autonomous Database.

Our previously create table called EMP has a column called DEPTNO that obviously contains the department number.  We first ALTER table EMP to declare this as a foreign key.

alter table emp 
add constraint emp_deptno_fk 
foreign key (deptno) 
references dept (deptno);

Once we create that foreign key, we then add an index on the same column as follows:

create index emp_deptno_idx on emp(deptno);

The reason why we want to create indexes on Foreign Keys is those indexes make constraint enforcement more efficient, and you’re likely to have SQL queries that filter on those foreign keys.  For example, if you wanted to run a report of all employees in a particular department, your SQL might benefit from an index on that column.

Will a SQL query that filters on a particular column benefit from an index on that column?  The answer to that question depends on the number and distribution of values in that column.  Exadata is able to SCAN data much more efficiently than non-Exadata platforms, but at some point it still makes sense to use an index.  Of course we don’t need to GUESS at this point because we’re going to use STATISTICS to let the Oracle SQL Optimizer figure this out for us, which brings us to the 4th rule regarding statistics.

Gather and Maintain Optimizer Statistics

It’s critical to always gather and maintain optimizer statistics.  It’s important to mention that Oracle long ago deprecated the old ANALYZE command and you should be using the DBMS_STATS package to gather your statistics.  It’s great that Oracle maintains support for older syntax and tools, but anyone still using ANALYZE really needs to update their skills.

Optimizer statistics are so important that Autonomous Database doesn’t allow you to have stale statistics.  Autonomous Database automatically gathers statistics for you and keeps the statistics updated as the data changes.  It’s important to note that Oracle19c also includes automatic statistics gathering.  It’s also important to note that Exadata is MUCH faster at gathering statistics than non-Exadata platforms, so it’s much less of a burden if you’re simply using Exadata regardless of database version.

Use the SQL Access Advisor

For any additional indexes besides those mentioned above, use the SQL Access Advisor to eliminate guesswork.  As shown in the following diagram from the manual, the SQL Access Advisor uses the workload running on a database to make recommendations for performance improvements.

Screen Shot 2019-02-26 at 3.25.00 PM

In an Exadata environment, we try NOT to use Materialized Views because the Exadata system can typically execute the SQL fast enough without adding the complexity of Materialized Views.  The SQL Access Advisor is part of the Diagnostics and Tuning Packs, which are some of the most popular add-on options to Oracle Enterprise Manager.

Automatic Indexing in ADW and Oracle19c

Oracle Autonomous Database is going beyond the “advisor” approach to make indexing fully autonomous.  This feature is also coming to Oracle19c.  The SQL Access Advisor uses a SQL Tuning Set as input and provides advise regarding indexing and partitioning.  Again, customers should avoid using Materialized Views on Exadata because they often aren’t necessary and make the database schema more complex, which increases maintenance of the schema and data.

The automatic indexing feature of ADW and Oracle19c goes beyond simply providing advice to automating the testing and implementation of those index changes.

Exadata (Automatic) Storage Indexes

One reason why we don’t need lots of indexes is because of the Storage Index feature of Exadata.  Storage Indexes on Exadata are created automatically based on WHERE Clause predicates in your SQL, combined with values in the data.  Storage indexes are create in memory inside each Exadata Storage Cell.  Storage indexes provide HIGH and LOW values of columns within each block of data.  The Storage Cell Software then skips those blocks that don’t contain the column value specified in the SQL.

It’s important to note that Storage Indexes involve a “learning algorithm” that relies upon SQL being executed against data stored in the database.  The Storage Index won’t be present on the first execution of a SQL statement, but will be created over the course of time as the database gets used.

Storage Indexes are automatic.  You don’t need to manage storage indexes, and you shouldn’t design your application and database schema to drive behavior of Storage Indexes.  Rather, you should allow Storage Indexes to work naturally according to data characteristics and the natural behavior of Storage Indexes.  Although it is possible to ORDER data to improve the impact of storage indexes, SORTING the data before loading might not be worth the time or effort.

Partitions are Effectively “Coarse Indexes”

It is important to understand that partitions are essentially equivalent to a “coarse index” rather than simply a mechanism to facilitate data maintenance.  The Oracle databases will skip over partitions that aren’t needed based on WHERE clause predicates.  This applies to both partitions and sub-partitions.  The partitioning scheme should be designed more based on data access than data maintenance.

  • Design partitions based on SQL access
  • Use Partitions and Sub-Partitions

Partitioning for data maintenance makes sense, but only when it doesn’t conflict with query access.

Global vs. Local Indexes on Partitioned Tables

Partitioning brings some interesting choices when it comes to indexes on those partitions.  You will need to know something about how users will access the data before deciding whether to use “global” versus “local” indexes.

It’s first important to understand the difference between global and local indexes.  In short, “local” indexes are tied to each partition, whereas “global” indexes span all partitions of the table.  Local indexes make data maintenance easier, while global indexes make access to data across partitions faster.

In the case where you have massive numbers of partitions with a local index, accessing data through the local index means touching the ROOT block of each index partition.  In the case I mentioned previously, the customer had 525,600 partitions, which means they would also have 525,600 index root blocks to traverse if they used a local index without also doing some partition elimination.

So, before you settle on a partition scheme, and before you create any indexes, you need to know how users will query the data.

You also need to understand how the data will be maintained.  If you DROP partitions frequently, that would mean rebuilding a GLOBAL index if any are created on that table.

Invisible Indexes

Marking indexes “invisible” is a technique to determine whether an index is truly necessary before dropping the index. This feature is especially useful when migrating an existing database to Exadata.

  1. Test query using NO_INDEX hint
  2. Mark undex invisible
  3. Drop index

Indexes can be marked invisible while you evaluate whether those indexes are truly necessary. First run affected queries with the NO_INDEX hint to determine impact of dropping the index.

Summary

In summary, use this simple indexing strategy outlined in this blog post for any databases running on Exadata.  Over-use of indexes can be a challenge when it comes to maintenance of those indexes.  Excessive numbers of indexes also gives the SQL optimizer more data access choices to evaluate.

Using Oracle Incremental Merge

This article outlines the Incremental Merge feature of the Oracle database and it’s intended usage.  This article also addresses how 3rd party products have been built upon this feature of Oracle, delivering database cloning capabilities (also known as copy data management) as well as backup/recovery solutions.  Finally, this article will cover how Oracle addresses such requirements using native features of the Oracle database in a Maximum Availability Architecture (MAA) configuration rather than relying on the Incremental Merge feature.

What is Incremental Merge?

The Incremental Merge capability of  the Oracle database refers to the ability to create a copy of a database and periodically update that copy by merging incremental changes into that copy.  In short, Incremental Merge is comprised of the following capabilities:

  • Image Copy Backup of Oracle Database
  • Incrementally Updating the Image Copy
  • Archive Redo Log Management
  • Restore and Recovery from Image Copy
  • Duplicate Database from Image Copy
  • Switch to Copy Feature

Incremental Merge involves a single copy of the database as-of a single point in time.  The database is drawn forward on the change timeline by applying incremental backups.  Archived redo logs spanning the duration of backup execution are required to de-fuzzy the contents of the resulting image copy (see note1).  The resulting image copy can be used for restore and recovery, duplicating databases, and switching a database to the image copy.

The following diagram shows how RMAN is used to apply the changes from an incremental backup to an image copy of the database.

Screen Shot 2018-11-07 at 4.52.51 PM

Note 1: For the purposes of this article, we assume customers are using HOT backup.  Customers seldom use COLD backups, where the database is shut down during the backup.  Database backups (including image copy and incrementals) should be considered “fuzzy” if the database was online and active during the backup.

Intended Usage of Incremental Merge

The Incremental Merge feature was intended to be used in a transient manner for specific tasks such as:

  • Cloning Databases to DEV/TEST
  • Upgrading database/storage hardware
  • Instantiating Data Guard Standby

The Incremental Merge feature was not developed to be an operational backup/recovery capability.  Incremental Merge maintains a single image copy as of a single point-in-time, whereas backup/recovery typically requires recoverability over a much longer window of time such as weeks or months.  One solution to this problem is DELAYING the apply of incremental backups to allow as much as 7-10 day of recoverability, while most enterprises require recovery windows for 30 days or longer.  Oracle recommends using a fully functional backup/recovery solution (meeting the business mandated recovery window) for the use-cases above instead of using Incremental Merge.

3rd Party Vendor Solutions

Several 3rd party vendors have developed solutions based on the Incremental Merge feature including the following:

  • Actifio
  • Rubrik
  • Delphix
  • NetBackup Co-Pilot

Some of these vendors go beyond the intended use of Incremental Merge as a transient data copy capability and recommend using it for operational backup/recovery.  Storage snapshots are used to provide multiple restore-points, whereas Incremental Merge natively provides a single restore point.

Avoid Products that use Reverse Engineering

Customers should avoid using any 3rd party products that use undocumented interfaces or reverse engineer features of Oracle such as internal Oracle data structures, including the contents of Oracle RMAN Backups.  Oracle may modify those structures and product behavior without notice at anytime in any version, release or even with a simple patch.  Customers should evaluate 3rd party products to determine if  undocumented interfaces are being used or if the vendor has reverse engineered the Oracle database.

Custom Built Solutions

It is important to note that customers have also built custom-scripted Incremental Merge solutions using these same core Oracle database features for creating image copy backups and updating them incrementally.  Customers have used both SAN (block storage) as well as NAS storage (such as Oracle’s ZFS storage) to build these Incremental Merge solutions.

Adding Snapshots to Incremental Merge

Incremental Merge provides a single copy of the database as-of a single point in time, which fits the intended use-cases of building database for DEV/TEST, for hardware migration, or instantiating Data Guard Standby databases.  A single copy of the database is created on a different set of hardware, and the database is incrementally updated until it reaches the desired point.

The following diagram shows how snapshots are added to incremental merge to provide multiple restore points.

Screen Shot 2018-11-07 at 4.54.05 PM

Some 3rd party vendors add storage snapshots to the Incremental Merge solution to allow multiple restore points so it can serve as a backup/recovery solution.  Some customers have also built custom scripted solutions following this model as well.  The resulting snapshots can then be cataloged with RMAN to allow database recovery using those snapshots.

Critical Solution Design Issues

The Incremental Merge solution presents several issues where process must be executed correctly to avoid corruption.  Customers should be aware of the following issues when implementing an Incremental Merge solution:

  • Timing of Merge Process
  • Timing of Snapshot Execution
  • Timing of Redo Log Archival
  • Handling of Archived Redo Log Backups
  • Recovery point required to de-fuzzy Image Copy

Timing of the merge process and snapshot execution is critical to avoid corruption of snapshot copies.  Incremental Merge was not designed for use with snapshots, and does not include the “snapshot optimization” feature of the Oracle database itself.  Snapshots cannot overlap with the Incremental Merge process, or corruption will result.  It is also important to note that the resulting Image Copy is not consistent and needs some amount of redo applied to make it consistent.  The necessary change-vectors to de-fuzzy the backup must be externalized into the archived redo and backed-up.  The recovery point is also critical  to avoid “file needs more media recovery” errors.

Oracle Error: “File Needs More Media Recovery”

Incremental Merge is a “fuzzy” backup that needs recovery to be consistent, which means not corrupt.  Most experienced Oracle DBAs are familiar with the Oracle error “file needs more media recovery”, which indicates the backup is corrupt (“fuzzy”) and can occur in several circumstances with different Oracle error numbers as follows:

  • ORA-01113
  • ORA-01194
  • ORA-01195
  • ORA-19901
  • ORA-01143

In all of these cases, redo logs are required to de-fuzzy the database and make it consistent before it can be opened for use.  Recovery into the middle of a fuzzy range requires restoring a PREVIOUS backup and rolling forward.  Image copy backups should be treated as fuzzy if taken while the database is up and running.  Proper redo log handling is critical to the Incremental Merge process because redo logs are required to de-fuzzy the image copy.

Managing the Archived Redo Log Stream

The Incremental Merge feature deals with handling of DATA blocks only, and does not address how the redo stream is handled.  Redo is critical to successful operation of the Incremental Merge feature.  Proper redo handling is even more critical when used in conjunction with snapshots.  Oracle recommends switching, archiving, and backing up the redo log so that change-vectors required to de-fuzzy the image copy are included in the backup.

The following diagram (from the Oracle RMAN documentation) shows how the redo log stream records database incarnations, which are used during database recovery.

Screen Shot 2018-11-07 at 4.57.02 PM

Oracle also recommends NOT including the archived redo stream in a snapshot, and certainly not in the same snapshot that contains the Image Copy.  Redo log change-vectors generated AFTER the incremental backup are required to de-fuzzy the database.  Database recovery also requires access to ALL available incarnations of the redo stream to properly navigate incarnations as shown in the diagram above.

Data Loss During Recovery

Archived redo is typically 20 minutes or longer behind the current changes in the database depending on the log switch interval.  The image copy will be as much as 24 hours behind current, while redo backups will be at least 20 minutes behind current.  Therefore, customers should expect anywhere from 20 minutes to 24 hours of data loss (loss of transactions) when using Incremental Merge for backup/recovery.

Online REDO logs are re-used in a circular fashion.  Archived redo is sequential and provides a record of changes over the course of time.

Screen Shot 2018-11-12 at 1.08.17 PMThe structure of Oracle redo also includes incarnations (as discussed earlier), with each incarnation representing different branches of the timeline.  In the diagram above, the latest transactions are contained in log sequence 110, which is not yet available in the archived redo.  Those transactions will be lost in an Incremental Merge solution.  Customers should consider Oracle Data Guard or the Zero Data Loss Recovery Appliance (ZDLRA) to eliminate loss of transactions.

Does Switch to Copy = Instant Recovery?

Oracle provides a feature known as Switch to Copy, which allows a database to be switched to an Image Copy instead of using Media Recovery.  Some 3rd party vendors have described this as “instant recovery”, which is not correct.  Switch to copy is a SWITCH operation that can be used in place of restore as shown below.

Screen Shot 2018-11-12 at 1.19.32 PM

Switch to copy involves pointing Oracle at a different copy of the database, which is similar to a restore operation, whereas database recovery is the act of applying redo logs.  At the end of the switch to copy command, the image copy will still need recovery, which is not instantaneous.  As discussed earlier, the image copy will normally be as much as 24 hours behind current, and redo logs will be at least 20 minutes behind current.  Any Image Copy taken HOT will also need some amount of recovery (log apply) to make it consistent as well as to reach the desired point-in-time.

Again, recall that Incremental Merge does not include provisions for handling the redo logs, so switch to copy involves data loss (loss of transactions).  Depending on the intended usage, this capability isn’t usable if the Image Copy resides on lower tier (slower) storage.  The concept of “instant recovery” implies that the database is usable and will provide the same level of service that users expect of the production database.

Switching to Equivalent Storage

The switch-to-copy feature should only be used with storage that meets performance expectations.  Production databases typically cannot operate on a lower tier of storage than used for production.  It is important to note that switch-to-copy using any 3rd party storage is not compatible with Exadata and is not supported.

Oracle recommends customers use Data Guard rather than switch-to-copy.  Data Guard is more widely used and avoids the data loss issues inherence with switch-to-copy as outlined above.  Data Guard Standby databases can also be placed on equivalent storage as the production database to meet end-user performance expectations.

Resource Stealing

Incremental Merge steals resources from the source databases including CPU, memory, network I/O, and disk I/O resources.  The same Oracle software version must be used to APPLY incremental changes to the image copy, and the most common method is to simply use the source database to merge incremental changes into the image copy.  Resource Stealing needs to be considered in system capacity planning and customers need to be aware of performance impact from resource stealing.  Oracle Data Guard does not rely on resource stealing, and places minimal overhead on source servers and network.

Oracle Data Guard

Oracle Data Guard provides the ability to instantiate a copy of a database and update that copy either synchronously or asynchronously via the redo log stream.  The following diagram shows the basic Data Guard configuration including the observer capability known as the Data Guard Broker.

Screen Shot 2018-11-12 at 10.54.52 AM

In addition to keeping a standby database in close synchronization with the primary, Data Guard also provides the ability to use a TIME DELAY, which is functionally similar to Incremental Merge with a different update mechanism.  Data Guard advances a copy of a database forward on the timeline of changes by using the REDO log, whereas Incremental Merge updates a copy of a database using incremental backups.

Using Data Guard for this case assumes the use of ARCHIVELOG mode, and FORCE LOGGING is required to eliminate gaps caused by use of NOLOGGING operations.  Some legacy hardware configurations might not offer sufficient performance for this configuration, while Exadata has proven to deliver the performance necessary for fully logged databases even with high transaction volumes as well as high volume ETL processing in Data Warehouse environments.

Oracle Snapshot Standby & SPARSE Disk Groups

Oracle’s Snapshot Standby is a critical feature for creating DEV/TEST copies of databases from production.  The process is fully automated through Oracle Enterprise Manager (OEM), Data Guard Broker, and SQL Plus.  The Snapshot Standby is created from a Data Guard Physical Standby, and can be reverted back to Physical and re-synchronized with the production database.

Once a snapshot standby is created, Oracle’s SPARSE Disk Groups also provide the ability to create multiple thin-provisioned (SPARSE) clones from a Snapshot Standby.  Oracle RMAN also allows SPARSE backups, extending the thin-provisioning capability into the backup solution as well.

Screen Shot 2018-11-12 at 12.31.09 PM

Database Recovery with Incremental Merge

The Incremental Merge process involves IMAGE COPY backups that are typically fuzzy copies needing redo to be applied to make them consistent.  There are essentially 4 database recovery scenarios that any backup/recovery solution needs to support as follows:

  1. Repairing Physical Corruption
  2. Point-in-Time Database Recovery
  3. Point-in-Time Object/Table Recovery
  4. Recovery Based Cloning

Oracle’s RMAN (Recovery Manager) tool is used to perform recovery of Oracle databases in all of the use-cases above.  Some 3rd party solutions also include interactive tools or APIs that layer on top of the functionality provided by Oracle.  In this section, we will cover this from the standpoint of the RMAN tool that most DBAs are familiar with.

Repairing Physical Corruption

The first use-case for RMAN addresses the need to repair databases if physical corruption occurs, which is also referred to as “recover to current” or recovering the database to the current (or latest) transaction.  RMAN provides 3 levels of physical database repair as follows:

  • Block Media Recovery
  • Data File Restore & Recovery
  • Database Restore & Recovery

Block Media Recovery can uses a FULL or LEVEL0 backup, or can use a Virtual Full on Oracle’s Recovery Appliance (RA). Redo logs are applied to recover the block(s) forward after the block is restored from the FULL, LEVEL0 or RA Virtual Full.  Block recovery also works with Oracle Data Guard as shown below.

Screen Shot 2018-11-12 at 12.41.36 PM

Point-In-Time Database Recovery

While the Recovery Advisor can detect and automatically launch repair actions when physical corruption is encountered, it’s not possible to automatically evaluate “logical” corruption caused by factors such as application failures.  For example, an application might be defective, or a user might delete data by mistake.  Those types of failures simply cannot be detected by the database.

Point-in-Time Object/Table Recovery

In some cases, logical corruption might impact only a single or a few tables within the database.  Rather than recovering the entire database to a prior point-in-time, it might be desirable to recover only those tables effected by the wayward application or user.

Incremental Merge with snapshots was useful in previous releases to facilitate table recovery.  However, Oracle12c and above offers the ability to recover tables using backups, as well as the ability to REMAP the table into a different schema as shown below:

Screen Shot 2018-11-12 at 11.15.50 AM

Tables are often recovered to a previous point in time due to application failures or end-user errors.  Placing the recovered table in a different schema allows a developer or user to examine the data to determine what changes (if any) should be made to the production data.  This process has become much simpler in Oracle12c due to the feature outlined above.

Validation Of Image Copy Backups

The Incremental Merge process effectively validates the blocks that have changed, but un-changed blocks are never validated.  Oracle recommends periodically executing the RMAN RESTORE command with the VALIDATE option to ensure integrity of Image Copy backups.  Oracle’s Zero Data Loss Recovery Appliance provides automatic validation of backups without using resources of the database server, and without manual data validation scripts.

Conclusion

Incremental Merge is a feature of the Oracle database that was developed for transient use such as creating database clones for TEST/DEV, instantiating Data Guard standby databases, and for migrating to new hardware.  Some 3rd party vendors have used the Incremental Merge feature to replicate capabilities that are provided natively as part of the Oracle database.  This article outlined how native features of the Oracle database provide many of the capabilities that Incremental Merge solutions have provided in previous releases.  Oracle recommends customers follow the Maximum Availability Architecture (MAA) reference architectures to meet business goals.

References:

Maximum Availability Architecture: http://www.oracle.com/goto/maa

ZDLRA: https://www.oracle.com/engineered-systems/zero-data-loss-recovery-appliance/

 

 

 

Oracle Recovery – Regarding Range Gaps

My team and I have been digging deep into Oracle database recovery lately, and we’ve noticed that some mistakes by administrators can jeopardize recoverability.  Oracle’s Recovery Manager (RMAN) product does a great job of catching issues, but it’s still possible to induce these failures.  This blog posting should help to explain how these problems occur and what can be done to correct problems with what we call “range gaps” in the recovery stream.

Restore and Recovery Range Gaps can be induced through improper use of Oracle’s Recovery Manager (RMAN) as well as through improper management of backup data stored in a Media Manager or other backup target.  Oracle’s Zero Data Loss Recovery Appliance (ZDLRA) will prevent these problems, or will detect, report and allow them to be corrected as outlined below.  Let’s start with a bit of terminology to set the stage.

Terminology

Restore Range Gaps – Restore Range Gaps refer to gaps in recoverability of data file backups as part of a full plus incremental backup strategy using a combination of LEVEL0 and LEVEL1 backups.  Lost Differential Incremental (LEVEL1) backups will essentially invalidate subsequent differential incremental backups.

Recovery Range Gaps – Recovery Range Gaps refer to gaps in recoverability of data due to lost ranges of redo logs.

Ordering Waits in ZDLRA – Oracle’s Zero Data Loss Recovery Appliance (ZDLRA)  will place backup pieces into an Ordering Wait status when Restore Range Gaps are detected.

Fetch Archive Log – When Real Time Redo Protection is used with ZDLRA, Recovery Range Gaps can be reconciled by the applications through the Fetch Archive Log (FAL) process.

Backup Polling – ZDLRA supports use of a “polling location” for ingesting backups.  RMAN backups written in Disk Format can be directly ingested into ZDLRA through the polling feature.

How Restore Range Gaps Occur

In a non-ZDLRA environment, gaps in the Restore Range can be induced through improper use of RMAN.  The following example occurs when a DBA performs a supplementary LEVEL0 backup to disk at SCN 500 in the following diagram:

Screen Shot 2018-04-12 at 1.02.54 PM

In the example above, the LEVEL0 is either un-tagged or uses the same tag as backups sent to the primary backup solution.  This means the LEVEL0 is part of the backup, even though it has been sent to a different location.

Restore Range Gaps in ZDLRA

ZDLRA is susceptible to the same mistake as shown in the previous section, but ZDLRA will detect the gap and allows the gap to be filled.  The example below shows that Virtual L0 backups stop being generated due to the Restore Range Gap.

Screen Shot 2018-04-11 at 6.04.45 PM

Gap Detection – ZDLRA will identify Restore Range Gaps, and will place subsequent backup pieces into Ordering Wait status.  The above diagram shows a LEVEL0 backup being taken to an auxiliary location such as a space on disk.  The resulting LEVEL0 backup includes changed blocks that are critical to database recovery.  Generation of Virtual L0 backups will terminate when a Restore Range gap is detected.  The LEVEL1 backups following the range gap will be placed into Ordering Wait state until the gap is resolved.

Gap Resolution – ZDLRA is able to correct this problem by simply “polling” the LEVEL0 into the Delta Store.  The LEVEL0 backup contained un-changed blocks, but also contains the changed blocks require to fill the Restore Range Gap.  ZDLRA will de-duplicate the data by simply discarding the un-changed blocks.

Intermediate Recovery Range Gaps

Recovery Range Gaps occur due to improper management of redo logs.   Recovery Range Gaps represent an unrecoverable range in the timeline, meaning the database simply cannot be recovered into that gap.  The example below shows a Recovery Range Gap approximately from SCN 425 to SCN 575.

Screen Shot 2018-04-12 at 11.06.13 AM

Existence of the associated LEVEL1 backups mean that the database can be recovered to points prior to SCN 425, or after SCN 600, but cannot be recovered to points between SCN 425 and SCN 600.  Recovery Range Gaps like this occur in one of 3 ways:

  • Deletion of Redo on Source Prior to Backup
  • Failed Backup Processing
  • Deletion of Redo on Backup Target

Redo logs are stored on each database server, either in an Log Archive Destination, or in a Fast Recovery Area. The Log Archive Destination (specified by the LOG_ARCHIVE_DEST_n parameter) configuration is supported for backward compatibility.   Customers should implement the Fast Recovery Area feature instead of using the older Log Archive Destination configuration.

The Fast Recovery Area simplifies management of redo logs and is designed to prevent improper deletion of redo that can jeopardize recoverability.  For more information on the Fast Recovery Area feature, please see the Oracle database documentation here.

The Fast Recovery Area allows DBAs to manage redo according to policies, and prevents deleting redo before it is backed-up.  Redo is marked as eligible for deletion only after being backed-up.  Administrators can still override this setting and delete redo even if it hasn’t been backed-up, but this will create a Recovery Range Gap.

We have seen Recovery Range Gaps generated due to failed backup processing.  Some 3rd party backup products have been known to NOT send error messages to RMAN even though the backup data was not saved.  One method to detect such problems is to execute the following RMAN command:

RMAN> CROSSCHECK BACKUP

The crosscheck command will check contents of the RMAN catalog against 3rd party media catalog and report any missing data.

Trailing Recovery Range Gaps

Trailing Recovery Range Gaps result in the same problem as described above, but are more likely to be caused by improper deletion of redo on the backup target instead of on the source.  The following diagram illustrates this type of failure:

Screen Shot 2018-04-12 at 12.32.29 PM

The above example indicates that REDO is being deleted too quickly even though it is required for recoverability.  Media recovery cannot be performed in any range prior to SCN 400 in the above example.  If all redo prior to SCN 380 has been deleted, this seems to indicate an improper deletion policy on the backup target.

Depending on change rates, redo logs can represent significant space consumption.  Redo logs tend to be less compressible, but simply will not de-duplicate because each redo log contains unique (non-duplicate) data by definition.

ZDLRA Resolves Recovery Range Gaps

ZDLRA is able to automatically resolve Recovery Range Gaps, and will ALERT when Recovery Range Gaps are detected.  The following diagram shows this capability:

Screen Shot 2018-04-12 at 12.38.42 PM

ZDLRA will detect Recovery Range Gaps and fill those gaps using the FAL (Fetch Archive Log) process, which reaches back into the FRA (Fast Recovery Area) of the protected database.  For databases not configured for Real Time Redo protection, any un-transmitted redo will be send to ZDLRA using the following backup command:

RMAN> BACKUP DATABASE ...
      PLUS ARCHIVELOG NOT BACKED UP...

The standard RMAN backup command for use with ZDLRA always includes backup of archived redo that has not been backed-up, even when Real Time Redo Protection is enabled.  This configuration provides a failsafe that ensures the redo will be swept from the FRA regardless of whether Real Time Redo Protection is functioning or not.

Conclusion

Range Gaps occur in both the “restore” stream as well as in the redo log or “recovery” stream of data sent to a backup target.  Any experienced database administrator knows that databases cannot be recovered to SCN’s that reside within a range of redo that has been lost.  It’s important to also know that loss of incremental LEVEL1 backups will create a similar gap in the Restore Range represented by a set of LEVEL0 and LEVEL1 backups.  When such Range Gaps occur, database administrators won’t know these problems exist unless a database recovery is attempted.  Oracle’s Zero Data Loss Recovery Appliance (ZDLRA) will identify and alert when Restore & Recovery Range Gaps occur.  These gaps can also be corrected using some of the unique features of ZDLRA, preserving database recoverability and meeting business needs for data protection.

 

4 Oracle Backup Methods

While there is a vast array of backup products on the market that support Oracle, all of these solutions implement 1 of 4 available methods as shown below:

Screen Shot 2018-02-27 at 2.42.02 PM

The 4 backup methods are categorized according to the method used to manipulate the data underlying the Oracle database.  This blog post will outline each of these 4 methods and will explain how Oracle’s Zero Data Loss Recovery Appliance (ZDLRA) is distinctly different from these 4 methods.

Legacy File Copy

As the name implies, Legacy File Copy is an older method for backing up Oracle databases that dates from the earliest days of Oracle.  I am covering this solution primarily for completeness and because it’s still used today (although rarely).  This method involves using tools such as O/S “copy” commands such as “cp” on Unix/Linux, as well as 3rd party tools such as MMV (Media Manager Vendor) utilities to backup database files directly.

Screen Shot 2018-02-27 at 4.41.59 PM

Cold Backup refers to making copies of database files while the database is in a shutdown state.  The database must be shutdown cleanly using SHUTDOWN NORMAL, SHUTDOWN TRANSACTIONAL, or SHUTDOWN IMMEDIATE prior to copying the underlying files.

Hot Backup refers to copying database files while the database is running.  Taking a hot backup using legacy file copy requires use of BEGIN/END BACKUP commands at the table space level or for the entire database at once.  The DBMS generates extra redo logging when these commands are used, so

Redo log handling is absolutely critical in legacy file copy backup.  The first challenge is to NOT backup files while they are being written by the ARCH (redo log archiver) process.  The next challenge is to NOT delete redo logs unless they have been backed-up.  The final challenge is to ensure all redo associated with the backup has also been backed-up so you have a complete set of redo to “de-fuzzy” the backup.  There are critical timing issues down to the microsecond level that can make the difference between success and failure.  Improper handling of redo was a major source of backup corruption in the days prior to RMAN.

Multi-Threading is extremely difficult in a legacy file copy solution, which ultimately limits the size of database the solution can support.

The vast majority of Oracle customers abandoned the old Legacy File Copy solution many years ago, but some customers continue using these solutions.  Most customers migrated to RMAN when it was introduced, and a small number of customers use storage snapshots as discussed in the next section.

Snapshot Backups

Snapshots Backups are typically implemented in the storage layer or disk array.  Snapshots were originally introduced in the late 1990’s, and were a good alternative to legacy file copy solutions, but have some critical complexities that we will explore in this section.

Screen Shot 2018-02-27 at 4.56.13 PM

Snapshots provide a virtually instantaneous copy of files on storage, which eliminates some of the complexity involved with multi-threading required for large database backups using the Legacy File Copy method.  Snapshots seem quite simple and effective on the surface, but the reality is less attractive.

Snapshot Backups are NOT Recommended

To be clear, we do NOT recommend using snapshots for backups.  Snapshots should be used for transient fallback on production systems, or for storage efficiency purposes on DEV/TEST systems.  Snapshots alone are not proper backups and should be used only in conjunction with storage replication or an auxiliary backup method.

Replicated Snapshots

Snapshots stored within the same storage as the database are not proper backups, especially if they are thin-provisioned (pointer-based) snapshots.  Loss of ALL data contained in the storage array (database and backups) is possible if everything is contained in a single disk array.  Replication is required in order to have a viable backup solution using snapshots.  The above diagram shows data and snapshots that are stored locally, as well as a replicated copy of the full storage and all snapshots.

Replicating Corruption is a danger with any bit-copy storage replication solution.  If the system suffers a ransomware attack, the encryption of data is dutifully replicated to the secondary site, resulting in encryption of BOTH sites.

Full Replication of storage including the database and all snapshots provides the best protection, but at higher cost.  In configurations where the database and all snapshots are maintained at both sites, local snapshots can be used for recovery as long as the failure doesn’t impact the entire disk array or the “base” that snapshots reference.

Snapshots at Replica Only provides good protection at lower cost because the space required for snapshots is only incurred at the replica site.  Using those snapshots for recovery on the primary site requires copying data back across the network.

ASM (Oracle’s Automatic Storage Management) needs to be used with care in conjunction with any storage replication.  ASM re-balance operations are a particular concern, since large numbers of blocks are “modified” at the storage layer, even though the affected blocks haven’t changed from the database perspective.  ASM re-balance results in massive numbers of blocks being replicated across the network.

Auxiliary Backup

Some customers implement an “auxiliary backup” in conjunction with snapshots rather than using storage replication.  One customer referred to this as a “snap & mount” solution because they used SAN (Fiber Channel) storage, and the file systems would be mounted on another system after the snapshot.  The second system would be used to run a backup of the snapshots.  Running auxiliary backups is simpler with NAS storage, since the secondary system does not need to be as closely aligned with production from the standpoint of versions, patching, etc.

Screen Shot 2018-02-28 at 8.17.53 AM

The diagram above shows storage snapshots contained within a disk array, with an auxiliary backup target (either tape or disk).  The database can be reverted to any of the snapshots within the disk array, or restored from the auxiliary backup target.  Setting end-user expectations for MTTR should be based on the worst-case, which is restoring from the auxiliary backup.  See the section on MTTR for more discussion of this topic.

The remainder of this section assumes that snapshots are used in combination with replication or an auxiliary backup solution.

Crash Consistent Snapshots

The simplest form of snapshot backups is the Crash Consistent Backup.  The configuration is relatively easy to understand, and relatively easy to operate.  However, this is what I call a “guaranteed data loss” solution as shown in the following diagram.

screen-shot-2017-01-03-at-1-00-37-pm

The above diagram shows a database advancing through time, and snapshots being taken at 3 points (red, orange, and blue snapshots).  The entire disk array is a single point of failure, so some sort of replication is required.

Oracle’s Snapshot Requirements must be met for a valid Crash Consistent Snapshot.  The snapshot solution must provide the following attributes:

  • Consistent across all files or disk volumes
  • Must preserve write ordering

Snapshot technologies that do not meet these criteria will not produce a crash consistent image of the database.  Please refer to the section on inconsistent snapshots for more detail.

NOARCHIVELOG Mode should be used in a Crash Consistent Snapshot solution, and the online redo logs MUST be included in the snapshot.  The entire database, online redo logs, and control files are all reverted to the same point in time.  Oracle will automatically run through crash recovery, rolling back any in-flight transactions.   It is not possible to recover to any points between snapshots in this configuration, meaning data loss (lost transactions) will occur.

Recoverable Crash Consistent Snapshots

Crash Consistent Snapshots can be made Recoverable, allowing for recovery beyond the instant a snapshot was created, or recovering to an SCN between snapshots.  The following diagram shows how crash consistent snapshots can be made recoverable by keeping REDO (online and archived REDO) separate from snapshots of the datafiles.

Screen Shot 2018-02-28 at 8.23.34 AM

In the above example, the database can be reverted to snapshot #1, then recovered forward using the redo logs.  Redo (and control files) are kept separate from the snapshot containing datafiles, which is what enables recovery.

Online Redo and Controlfiles must be kept at “current” or at a later point in time from the database to allow media recovery.  Transactional changes to the database are stored in the range of online redo as well as the stream of archived redo that precedes the online redo range.

Archived Redo Retention must reach as far back as the time of the snapshot used for recovery.  There is no reason to revert the stream of redo to an earlier point, since the redo represents a continuous stream or timeline.  The database can be recovered to any SCN along the timeline, rolling forward from the point of any snapshot.

Snapshots (or Restore Points) on Redo/FRA can be used to protect against administrator error, viruses, ransomware attacks, etc.  The Oracle database redo log is a write-ahead log, and log files are never over-written.

The High Write SCN determines the lowest point in time that can be used for recovery with any specific snapshot.  Oracle’s “snapshot optimization” feature was developed in conjunction with storage vendors.  The storage writes a timestamp into the snapshot, and that timestamp is then used in the RMAN commands using the SNAPSHOT TIMESTAMP clause.

The Database Scanner utility can be used with older versions of the Oracle database using storage that doesn’t support the Snapshot Optimization Feature.  The utility is used to scan the entire database looking for blocks with the highest written SCN in the database.  That SCN defines the point the database must be recovered to at minimum.

File Needs More Media Recovery is the error that will occur if the database is not recovered to a point after the high-write SCN.  A previous snapshot will have to be used in that case to hit the desired SCN.  The choice is to use the database scanner (a full scan of every block in the database) or trial and error will show whether the chosen SCN is high enough.

Inconsistent Snapshots

Some older snapshot technologies were not able to meet the data consistency requirements of Oracle.  All data structures of the Oracle database (datafiles, controlfies, redo logs, etc.) must be snapped at the SAME instant in time.  The order of writes also must be preserved to ensure data integrity.  Customers need to be aware that some NEW snapshot technologies cannot meet these data integrity requirements, so the old “inconsistent snapshot” method has to be used.  The following diagram shows how inconsistent snapshots can be used as an Oracle database backup method.

Screen Shot 2018-02-28 at 9.06.42 AM

As noted previously, the disk array itself is a single point of failure.  Loss of the disk array means loss of the database and all snapshots.  There are specific commands that can be run on most disk arrays that will jeopardize the database and all snapshots.  Customers should be sure to use storage replication or an auxiliary backup method with snapshot technologies.

BEGIN/END BACKUP commands must be used in conjunction with the snapshot, either at database or tablespace level.  Database redo logging increases dramatically after executing BEGIN BACKUP.  The excessive redo log rates continue until END BACKUP is executed.  The BEGIN BACKUP command does not “quiesce” the Oracle database.  The command causes the Oracle database to generate additional information into the redo log to “defuzzy” the backup and resolve any “split block” conditions caused by the inconsistent snapshot.

Online Redo, Archived Redo, and Controlfiles must be kept separate from the datafile snapshots as shown in the above diagram.  If those data structures are placed into a separate snapshot, that snapshot must be Crash Consistent.

Archived Redo Retention (as with Recoverable Crash Consistent Snapshots) must reach as far back as the time of the snapshot used for recovery.  There is no reason to revert the stream of redo to an earlier point, since the redo represents a continuous stream or timeline.  The database can be recovered to any SCN along the timeline, rolling forward from the point of any snapshot.

MTTR from Snapshots

One primary rule with MTTR (Mean Time To Repair) calculations is to “plan for the worst, but hope for the best”.  Recovery using a local snapshot is obviously the best case, while recovery using a remote replica or auxiliary backup will provide the worst case recovery.

One customer I was involved with used snapshots as the primary method of recovery, and set expectations with the business that recovery would be done in less than 1 hour.  This customer had a 48TB database with auxiliary backup using EMC’s Data Domain.  Restore from Data Domain ran at 1TB/hour, which means 48 hours for restore time alone for their 48TB database (restore is only part of the recovery process).  The IT team was faced with business expectation of 1 hour recovery, but 48 hours of restore time.  In the end, this customer suffered a 4-day outage due to inadequate secondary solution and a primary solution that was high risk (snapshots are not proper backups).

Snapshot Recovery is Manual

All recovery from snapshots is manual and requires coordination between DBA and Storage Administrator.  Features such as Oracle’s Recovery Advisor cannot be used because all of the backups (snapshots) are done outside of Oracle’s control.  Some 3rd party vendors offer specialized tools to aid in recovery using snapshots.

RMAN Backup Sets

Solutions based on RMAN Backup Sets are the most commonly used method for backup & recovery of Oracle databases.  I would estimate that greater than 80% of Oracle databases are protected using RMAN Backup Sets.  The vast majority of Oracle customers also implement systems according to ORacle’s Maximum Availability Architecture (MAA) guidelines, and RMAN Backup Sets are a critical component of MAA.

Screen Shot 2018-02-28 at 10.18.42 AM

RMAN was introduced in Oracle8i with the Backup Set capability in the late 1990’s, and Backup Sets have become the most widely used method for backup/recovery of Oracle databases.  There are 3 implementations of RMAN Backup Set configurations as follows, using backup to disk, backup to media manager, and staging areas as follows:

Screen Shot 2018-02-28 at 10.31.41 AM

RMAN backup sets have a different format when written directly to disk as compared to SBT (Serial Backup to Tape) format sent to a Media Manager.  The Media Manager might be configured to use a “disk pool” to store the data, but it’s still in SBT format.

The Stage & Sweep Configuration involves RMAN writing the backup to disk, then the Media Manager is used to “sweep” that data to whatever storage device it is using (disk, tape or VTL). The disk staging area should be sized large enough to contain at least 2 full backups, all associated incremental backups, plus all associated redo.  The WFDDI (Weekly Full Daily Differential Incremental) will give this configuration:

Screen Shot 2018-02-27 at 2.05.34 PM

The above diagram shows space for 3 weekly full backups, plus all intervening incrementals and archived redo.  The resulting RECOVERY WINDOW is 2 weeks (from disk) because the oldest backup will be deleted before running the next backup.  Older backups can be stored only on the Media Manager, but a double-hop will be required during recovery using any backups coming from the Media Manager.

A Double Hop Restore is required in the Stage & Sweep configuration for cases where the needed backup is not on disk in the staging area.  The Media Manager stores the data in it’s own proprietary format, and RMAN cannot access those backups directly.  The Media Manager must retrieve the needed backups and write them back to the staging area before RMAN can access those backups.

Recovery from Full + Incrementals

The most common backup strategy with RMAN Backup Sets is Weekly Full Daily Differential Incremental (WFDDI).  Some customers still use Daily Full (DF) backups for smaller databases, and lower Recovery Windows.  The following calendar shows a 30-day Recovery Window, which requires 37 days data retention:

Screen Shot 2018-02-26 at 12.27.27 PM

The “rule of thumb” for performance using WFDDI is that end-to-end recovery will be 2X longer as compared to recovery using a Daily Full (DF) strategy.

HA/DR and Backups

Note that Oracle’s MAA (Maximum Availability Architecture) team specifies that HA/DR (High Availability and Disaster Recovery) solutions should be used in conjunction with a backup/recovery solution.  Requirements for “instantaneous recovery” (low Recovery Time Objective) should be addressed through a DR solution such as Oracle Data Guard or Oracle Golden Gate.  Requirements for preventing system downtime should be addressed through HR technologies such as Real Application Clusters (RAC).  Backup/Recovery is distinctly different from HA/DR, and allows a system to be recovered BACKWARD in time to a prior point.

Low RPO Requirements

It’s important to note that Oracle’s Zero Data Loss Recovery Appliance (ZDLRA) can be used to deliver extremely low RPO (Recovery Point Objective) requirements in cases where RTO (Recovery Time Objective) is not as stringent.  As noted previously, a Disaster Recovery (DR) system using technologies such as Oracle Data Guard and/or Oracle Golden Gate is the recommended method for delivering low RTO.  Prior to the advent of ZDLRA, RTO and RPO were essentially linked and were addressed together.  We are now able to address low RPO requirements separately using ZDLRA, reducing the operational complexity of recovery in cases where time is not as critical.

RMAN Incremental Merge

The RMAN Incremental Merge feature was introduced in the Oracle10g release in 2006. This feature allows customers to create an IMAGE COPY backup of a database, then update that image copy using incremental backups.  Another way to describe this feature is “Incrementally Updated Image Copy” backups.  The image copy is updated as of a specific point in time.  Storage snapshots can then be added to this capability to provide recoverability to multiple points in time.

Screen Shot 2018-02-28 at 12.20.22 PM

The RMAN catalog tracks the original image copy, along with each incremental that is executed.  Snapshots of the Image Copy are taken at various intervals, and these are done outside the control of RMAN.

Image Copy Location – Separate storage is used to hold the image copy.

CPU Stealing – The merge operation is performed using “CPU Stealing” from the production database server.  The production database server executes an incremental backup on the database, then applies those changes to the image copy.   This places additional load on the database server for the duration of the Incremental Merge process.

Incremental Merge Performance is affected by the fact that changes to the database tend to involve random I/O.  Performance of the incremental merge operation can be increased only by putting the image copy on faster storage.

The “Switch to Copy” capability of Incremental Merge is only useful if the image copy is stored on the same tier of storage as the primary database.  We typically put backup on lower tiered (less costly, less performant) storage than the primary database.  However, if that backup will potentially BECOME the primary database by doing “switch to copy”, the storage needs to be of similar performance class.

Incremental Merge Recovery is Manual

Using an RMAN Incremental Merge with Snapshots, all recovery is manual.  Guided Recovery in OEM and Oracle’s Recovery Advisor aren’t used.  It’s possible to mitigate this somewhat by registering the snapshots in the RMAN catalog.  Third party tools that implement this method typically provide some tooling to assist Database Administrators with recovery, but those tools are not as widely used in the industry.

ZDLRA – the 5th Method

Oracle’s Zero Data Loss Recovery Appliance is nominally based on RMAN Backup Sets, but it’s a distinctly unique solution.

screen-shot-2017-01-03-at-3-29-11-pm

 

Any Oracle version and any platform (those currently supported as of this writing, meaning 10g and above) can be configured to use the Recovery Appliance.

An Initial Full (LEVEL0) Backup is used to seed the recovery appliance.  That initial full backup effectively does not exist after a while, as blocks belonging to that backup are eventually purged.

Delta Push uses the RMAN Incremental API to push changes from the database to the Recovery Appliance.  While this uses the syntax of a conventional RMAN LEVEL1 (incremental) backup, it is functionally quite different.  Each Delta Push is automatically transformed into a Virtual Full backup.

Virtual Full (LEVEL0) Backups appear in the RMAN catalog after the contents of each Delta Push are processed.  There is a Virtual Full created for the initial physical LEVEL0, as well as for each subsequent Delta Push.

Real Time Redo Protection uses the Data Guard API to protect the leading edge of the redo stream.  Redo is transmitted asynchronously to prevent the Recovery Appliance from becoming a bottleneck.  The Recovery Appliance ensures Zero Loss up to the last transmitted SCN in this configuration.  ZDLRA is supported for use with a Far Sync Server to allow synchronous capture of redo changes without imposing a bottleneck across multiple systems.

ZDLRA Benefits

The Zero Data Loss Recovery Appliance provides a number of unique benefits compared to other solutions on the market.

screen-shot-2016-12-25-at-4-18-12-pm

Efficiency of ZDLRA begins with impact on the production database servers, then extends through the network and into the space required on ZDLRA.  Changes are pushed to ZDLRA using the Delta Push process (the most efficient possible), and those changes are automatically converted into Virtual Full Backups.  The Virtual Full Backups are then used during recovery, making the recovery process more efficient as well.

Validated Recoverability is a key benefit of ZDLRA.  Backups are validated proactively rather than during recovery.  Customers can recover with confidence knowing the backups have been validated.  The Recovery Appliance will attempt to automatically resolve validation failures (such as redo log gaps), and will report validation failures if they cannot be resolved automatically.

Automation & Simplicity is another key benefit of ZDLRA.  The Recovery Appliance uses familiar tools and the full range of automation capability (such as Guided Recovery) that is already built into the Oracle ecosystem (Database, RMAN, OEM, etc.).  Backups are dramatically simpler because each database simply does a Daily Delta Push rather than complex scheduling of weekly full backups that might contend with application processing.

Cost Effectiveness is extremely important for any backup/recovery solution, and ZDLRA is the most cost-effective solution on the market.  ZDLRA requires the least amount of storage space possible because of the change-based design that directly extracts changed blocks through the Delta Push process.

Space Usage Comparisons

All 4 of these solutions have a variety of advantages and disadvantages, but we should first compare the based purely on the amount of storage space required.

While all of these solutions provide dramatic space savings over generic solutions, ZDLRA provides the greatest storage savings of all.  The most dramatic saving comes from the use of RMAN Incremental Backups (WFDDI) as opposed to Daily Full (DF) backups.  As shown in this example, ZDLRA requires approximately 1/3 less storage space than all of these other solutions.

Screen Shot 2018-02-28 at 1.33.47 PM

As shown in the table above, with a given database size of 100TB, and with the same redo generation rate, change rate, recovery window, etc. ZDLRA requires the least amount of space.

DF Generic means Daily Full backup to general purpose storage.  Notice that the “recovery window” is equal to “Retention Period” since full backups are taken daily.

WFDDI Generic shows the impact of implementing a WDFFI (Weekly Full Daily Differential Incremental) strategy.  This is a dramatic 4X savings in space as compared to Daily Full backups.  Notice that the “Retention Period” is 7 days longer than the “Recovery Window” because this scenario uses a Weekly Full backup.

De-Dupe takes the WFDDI strategy and adds de-duplication storage.  Daily Incrementals and REDO logs do not contain duplicates by definition (these contain all unique data).

Rep+Snap shows the use of storage replication with snapshots on the replica side only.  This method uses a full size replica of the database, then adds daily snapshots to capture changes.  The changes are space efficient, but this solution requires a full sized copy (replica) of the database.

Incr. Merge shows the storage required for an RMAN Incremental Merge solution with snapshots.  As with the replication + snapshot method, RMAN Incremental Merge uses a full sized Image Copy of the database, with the same sized disk allocation.  Changes contained in snapshots are space efficient, but the fully provisioned database size consumes the same amount of space as on production.