1. Field of the Invention
The invention relates to the field of computer systems and software. More specifically, the invention relates to systems and methods for predicting the relationship between information stored in databases.
2. Description of the Background
Computer databases store and organize data using “metadata.” Metadata is often defined as “data about data.” In other words, metadata describes the data contained within a metadata descriptor, and may also include information about how and when a particular set of data was collected or how the data is formatted. The study of metadata is essential to understanding the organization of a database as well as how one database relates to other databases. An example of metadata, or more specifically “a metadata descriptor,” is a variable or field name for holding data, such as the metadata descriptor, “customer_last_name.” The name of the metadata descriptor “customer_last_name” is an example of a metadata descriptor that a database administrator could use for a field name or variable that contains the customer's last name. In this example, “customer_last_name” is the metadata descriptor, while the actual last name, such as “Smith” is the data contained within the metadata descriptor in the database. A database administrator knows that “Smith” is a customer's last name only because of the metadata descriptor “customer_last_name.”
Different databases may use different metadata descriptors to describe the same type of data. When data is exchanged from one database to another, an interface must be used to translate metadata descriptors from one database to metadata descriptors in the other database. For example, one database may use the metadata descriptor “customer_last_name,” while a second database could use the metadata descriptor “cust_lt_nam” to describe identical data, i.e., a customer's last name. The metadata descriptors may also have different variable or character string lengths that would also be addressed by the interface between the databases.
In a large computer system, or in a large organization, there may be many databases, each with a separate naming standard. It is not uncommon for a large corporation to have thousands of databases, may be created by different database administrators and using a different naming standard. If data is to be exchanged between databases, a separate interface between each database is often necessary. In a system of databases, creating interfaces between databases is a large, tedious task, particularly if there are many databases. Furthermore, if a change is made in one database, the change must be made to all interfaces or databases with which the first database interacts. For example, if a variable string length is changed to include more or less characters, the change must be propagated throughout all the effected interfaces. Particularly in a large computer system, it is difficult and tedious to ascertain which interfaces and databases are affected by such a change.
One conventional method for facilitating communication between multiple databases involves mapping metadata descriptors to a consolidated database. The consolidated database includes a common naming standard. If each metadata descriptor within each database can be mapped to a particular “naming_standard” metadata_descriptor in the consolidated database, metadata descriptors from one database may be mapped through the consolidated database onto metadata descriptors in a second database. Such a consolidated database is often referred to as an “Enterprise Database” (EDB).
Even if a naming standard in a consolidated database is used, the process of mapping metadata descriptors to the consolidated database is normally a very tedious task. An administrator must be familiar with the naming standard used in the EDB as well as the metadata descriptors in each individual database. The metadata descriptors are manually mapped to the naming standard in the consolidated database. After the metadata descriptors from all of the different databases have been mapped onto the naming standard in the consolidated database, it is still difficult to ascertain how one change in one database will affect all of the other databases.
These and other problems are avoided and numerous other advantages are provided by the methods and systems described herein.