A Data Warehouse is the repository of an organization's electronically stored data. Data Warehousing is the process of building a data warehouse and of facilitating the reporting and analysis of information in the data warehouse.
A Change Data Capture (“CDC”) is a set of software design patterns used to determine the data that has changed in a dataset so that action can be taken using the changed data. A CDC may allow for the capture and preservation of data states across time.
An Online Transaction Processing data set (“OLTP”) refers to the data set created or accessed by transaction-oriented applications, typically for data entry and retrieval transaction processing. For the purposes of this application, “transaction” may refer to a computer or database transaction such as a database read or write. “Transaction” in this context may also refer to a business or commercial transaction such as a sale or bank withdrawal. Online transaction processing systems typically respond in real-time to user requests.
A Data Warehouse (“DW”) may be derived from one or more OLTPs. A DW may be created by periodically applying a CDC to an OLTP.
A DW may not necessarily retain the data structure and format of an OLTP from which it is derived. For instance, it is typical in the banking context for transactions within an OTLP to be recorded with a granularity approaching real time but the DW derived from that OLTP to be recorded with only daily granularity.
The DW derived by the application of a CDC to an OLTP over time will contain records and fields. A record is defined for the purposes of this application as a collection of fields. A field, for the purposes of this application, can contain a single data point, a pointer to another record, or another record.
When a CDC is applied to an OLTP, the changes recorded may concern changes at the record level, such that a change to a single field within a record will be recorded by the CDC as a change to the entire record.
Records, including records in a DW may have boundary dates, consisting of an effective date and an end date. The effective date may be the date on which the data reflected by a particular record becomes valid, and the end date may be the latest date on which the data reflected in a particular record is valid, or the date one increment before that data is changed. Before a particular record is changed, or when the record as changed remains current, it may be assigned a theoretical maximum end date, such as Dec. 31, 9999.
Boundary dates may be dates, or can be members of any totally ordered set as that term is understood mathematically, whether or not bound to a progression external to the data. Where boundary dates are not dates per se, terms such as “earlier,” “later,” and “historical” are given meanings that are analogous to those terms when used in the context of dates per se.
An element-specific view (“ES view”) is a view of a DW that reflects a CDC on only a subset of the fields or records normally analyzed when creating or updating a DW. An ES view may also maintain historical information for that subset.
Conventionally, creating an ES view requires the application of an Extract, Transform and Load (“ETL”) process to archived or original OLTP datasets, essentially creating a new DW structured to support the ES view.
Conventionally, creating an ES view from an existing DW requires parsing the DW so as to reverse-engineer an “echo” of the OLTP datasets over time, and then applying an ETL process to those datasets.
In either case, the ETL creates a series of “deltas,” which reflect the changes in the OLTP dataset or any existing or created archive of OLTP datasets, filtered for the fields.
An example of a possible relationship between fields, records, and boundary dates is provided in FIGS. 4-6, discussed below.
Conventionally, returning an ES view specific to a particular element requires either returning to the OLTP or archives thereof and running an ETL process over those datasets, tuned to extract the particular element and exclude other record changes, transforming that information for format and granularity, and then loading that information as a separate dataset, consisting of an initial state and a series of “deltas” recording the changes in that state. Alternately, the DW could be mechanically parsed, essentially creating an OLTP archive, and then running an ETL process over that recreated OLTP archive.
Typically, ETL processes may be very resource- and time-intensive. Further, the newly created element-specific DW may be a very large file. Further, the element-specific DW may create multiple duplicative data points distinct from the original DW, giving rise to data integrity issues.
Typically, users may require many different ES views, magnifying the problems presented by the conventional approach.
It is desirable, therefore, to provide systems or methods for ES views to be generated from the original DW virtually or dynamically.