1. Field of the Invention
The invention relates generally to database management. More particularly, embodiments relate to methods to transfer data by setting chase rules.
2. Background Art
In many industry applications, a large amount of data is generated, collected, and deposited to/transferred between databases. For example, in the oil and gas industry especially during oil Exploration and Production (E&P), geological formation data are collected in the exploration stage. These geological formation data are stored for the use of making decisions on where to drill. In the production stage, data on oil and gas flow rates are collected to monitor the production processes. A large investment is often needed to build a database to store the aforementioned data. It is essential that all the data are stored and managed in such an efficient way that subsequent search and use of the data are facilitated.
The market of local DBMS is dominated by client-server products called SQL Servers, developed by Sybase, Informix, and Microsoft. SQL Servers are based on the Structural Query Language (SQL). SQL Servers are popular in low-end applications in small businesses and local offices where the local DBMS run on stand-alone desktops and laptops. SQL Servers are low cost, require simple administration, and generally provide good performance, as a result of being based on the popular Windows NT® technology (Windows NT® is a registered trademark of Microsoft Corporation, USA). On the other hand, Oracle® (Oracle is a registered trademark of Oracle Corporation, USA) dominates the high-end database market (such as corporate and national repositories) because of its high scalability, reliability, and a wide assortment of features. Oracle® and SQL Servers have many differences, and it is often difficult to support applications on both Oracle® and SQL Server. Software vendors (and hence users) are often forced to choose one platform over the other. Migrating data from one platform to another and merging data may require extensive effort. Further, a customized DBMS is often required to run on different platforms.
In the following context, a database instance and a DBMS are collectively referred to as a “data repository.” A database instance can have many data stores. A data store contains a collection of tables and views. A table is an array of data. A view is a particular way of looking at a table, and the change of a view does not affect the physical organization of a table. Users are typically not granted access to the tables directly. Rather, they are granted access to views of the tables. A view in a data store provides a standard interface to that data store. Tables and views can be private or public. Conventionally, for a database entity XYZ, a private implementation of a table is represented with XYZ_, and a public interface view is represented with XYZ. The data in a data repository are usually related among themselves, and such a relation is usually defined by a logical specification in the form of a “data model.”
In Oracle® and SQL Server, as well as in other commercial DBMS, a database may include many subsets of entity types in the data model, referred to as “domains.” For example, in an oilfield service setting, the domains include Drilling, Production, Log, Reservoir, Seismic, and Well. These domains could have very different sizes and complexity. For example, the Well domain could be relatively small and simple, while the Production domain could be large and complex.
Basic operations involving a database include: (1) search operation, which allows a user to specify search conditions such that the retrieved set of data matches the search conditions; (2) insertion, by which a new set of data with given attributes are inserted into a table; (3) updating, which modifies the attributes of a selected data set from a table into new values; and (4) deletion, which delete a data set from a table.
A major part of data management is loading new data into the data repository. Some of the new data are modifications of existing entities while some data are entirely new entry to the data repository. It is important to set up rules for deciding whether the incoming data match existing entities. Part of the rules is to ensure that some of the incoming data have mandatory values and that referential integrity is satisfactory.
In the case of a violation of the rules, the DBMS should have options to completely reject the incoming data set, or to allow partial loading of a consistent subset of the incoming data. If a violation causes a complete abortion of a transaction, it is potentially dangerous because the database is in an inconsistent state unless all the previously rejected data can be reloaded in a single, subsequent transaction. On the other hand, if referential integrity is enforced while loading data, the incoming data must be in the right order without modifying the manner in which the incoming data is entered. In some cases the incoming data includes cyclic references and thus there is no way to keep insert incoming data in the right order without modifying the manner in which the incoming data must be entered into the database is to break the cycles.
In addition, a DBMS catches and reports errors one at a time. Hence, when there are many errors in a data set, it is important to see and fix systematic errors. It is also important to decide whether to repeatedly fix newly discovered errors and try again or to reject the data set because it is inconsistent.
In a general sense, data transfer refers to the exporting of data from one data store and the importing of that data into another data store. Selecting data to export from the source data store should be an efficient and accurate process. It is painstaking to specify every single entity instance within a data store to be transferred. Instead, a user prefers to specify the transfer in logical terms. For example, the user may want to transfer data about a specific high production oil well (i.e., a “gusher”) from the centralized data store to the rig site data store. Accordingly, one must formalize what it means to transfer data about a well. For example, a user may want to define that the transfer of well data includes the transfer of data about all the boreholes belonging to that well. This is an example of a chase rule, where the term “chase” refers to the process of obtaining all the data based on a simple rule. A chasing “engine” is used to compute the entity instances to be transferred based on an initial set of selected entity instances (known as seeds) and a set of chase rules, where the term “engine” refers to a code generator.