1. Field of the Invention
Embodiments of the invention described herein pertain to the field of computer databases. More particularly, but not by way of limitation, one or more embodiments of the invention enable a method for generating and using a reusable custom-defined nestable compound data type as database qualifiers.
2. Description of the Related Art
Early computer database models required designers to access data through low level pointer operations that linked records through physical addresses. Querying the database required an understanding of the physical structure of the tables. To create a school registration system, for example, a designer might create a large table containing a row for every student. Such a table would require columns for data about the students (such as ID number), classes registered for, dates and times of the student's classes, class locations and teacher for each class. Such a table is easy to understand, but would waste space repeating data items such as a teacher's name in row after row of students registered for the same class.
Relational database management systems (RDBMS) were developed in the 1970's to provide database designers with a method of efficiently organizing data in an abstract form that takes advantage of the natural relationship of data items to each other. E. F. Codd's idea to organize data by relationships separate from its physical storage method was revolutionary. A relational database provides a more flexible solution in part because the abstraction created a more maintainable and efficient data model. For example, a database designer could create a database design (schema) that represents a school registration system by placing data regarding students in one table, teachers in another table, classes offered in a third table and class registrations in a fourth table. Each table would have columns (attributes) representing only the kinds of information relevant to the data stored in that table. The data itself would be stored in table rows (records). In a given record, the attributes may be referred to as fields of the record. Thus, the student table would have one row for each student; the teacher table would have one row for each teacher; and so on. Each student record would have fields for a student ID number and student name.
FIG. 1 illustrates a relational database schema for a simplified school registration system. The design reduces storage and highlights the relationship between the various data items. Because multiple students will register for the same class, information about the class is not required in each registration record; instead, only a reference (pointer) to the class information is required. Computer memory need not be wasted on placeholders or repeated data. Each table in the relational model includes a small primary key field that makes each row of the table unique, even if a student, for example, unintentionally registered for the same class twice. In FIG. 1, attributes 115, 125, 135, 145, and 155 represent primary key fields.
During design or maintenance it may become apparent that a one-to-many relationship is required between data items in a record. Typically, the solution is to break up such data into separate tables. Unfortunately, this solution may cause the relationship between the records to be become obscured The RDBMS designer solves this problem by including “foreign keys” as well as primary keys in the record of the new table. While a primary key makes a row unique, a foreign key relates the data in a row of one table to data in a row of another table. In an RDBMS a table is also referred to as a relation. Thus, in Registration Table 130, RID 135 is a primary key, while COID 156 and SID 116 are foreign keys that relate each registration record to a student (a record in Student Table 110) and a class offering (a record in Class Offering Table 150). Class Offering Table 150 contains primary key COID 155, and foreign keys CID 146 (from Class Table 140) and TID 126 (Teachers table 120). This schema, when populated with data, can produce a course catalog report (using Classes Table 140 and Class Offering Table 150), a Teacher schedule (using Teachers Table 120 and Class Offering Table 150), a student schedule showing classes, class titles, credits, class location, and teacher's name (using all tables 110, 120, 130, 140 and 150), all with a minimum of duplicated data and wasted space.
A database designer generally creates a database schema, such as that illustrated by FIG. 1, manually. The designer must gather information regarding all the required reports, data available, import and export requirements and the relationships between the data items and then must synthesis the information into a database schema such as that depicted in FIG. 1. For large databases with complex relations, schema design may require a team of designers with sophisticated design tools and a significant amount of design time. Each relation and attribute will be described in a data dictionary and individually instantiated.
Traditionally, the definition of a set of related fields for a conventional database table is neither reuseable nor portable. Neither does current schema design systems support flexible or extensible user defined data constructs. When a designer creates a database schema, creates tables to support the schema, and then populates the tables with data, the design becomes fixed. If the database design or organization must later be changed, it will often require export of all the data, redesign of the database, re-creation of the tables and importation of the data into the new schema—if such an operation is even practical at all. Thus, RDBMS schemas for large database systems are complex to reuse and complex to port from one system to another. Even mining data from a large system to import into an alternative schema can be an overwhelming task. Yet today, database evolution and sharing of data between unrelated databases is an essential part of the way we work and do business.
Various approaches to managing data import for large databases into RDBMS systems have been attempted, but all suffer from an inherent drawback of the RDBMS concept—that the data relationships in an RDBMS are not exposed from the source to the destination. FIG. 1A illustrates the direction of relationships in the example of FIG. 1. A designer maintaining Class Offering Table 150 is not exposed to the fact that COID attribute 155 is relied upon by Registration Table 130 through reference 157. The same can be said for relations 117, 127, and 147. Thus, the dependency of one table on another is obscured. Further, the view presented to a designer is a “decomposed” view of all the records of one table or all the records of another table. When designs include teams of designers and hundreds of tables, this lack of visibility becomes a disadvantage.
Another disadvantage of a traditional RDBMS schema approach is that, in many instances, data relationships in the real world are nested by nature. In a contact management system, a single contact may have more than one address (business, home, mailing, billing, and shipping, for example) and phone number, email account and reminder date associated with the single entry. FIG. 2 illustrates a simple contact management system in an RDBMS. Each Customer has an identifier (ID 220), a name, street, city, state, zip and country. However, Customers Table 210 cannot support allowing Customer 220, Joe's Pizza, to have two addresses—one for delivery and one for billing, for example.
A traditional design solution would break the relation into two tables to allow the data to have a one-to-many relationship between some information and other information. FIG. 2A illustrates this traditional solution. Customers Table 210′ now has two attributes (columns), a primary key and a customer name. Address Table 230′ contains the fields removed from Customer Table 210 of FIG. 2, with the addition of the foreign key attribute, CustID. Now, a search for Joe's Pizza (CustID 260) will return two entries from Address Table 230′: Row 250 (Addr ID 901) and Row 260 (Addr ID 904), because both have a matching CustID 220 (containing 1001.) Ideally, at least one additional attribute would be added to Address Table 230′, not shown in the Figure, to indicate the purpose of the address stored in each row of Address Table 230′.
At least one disadvantage of this traditional approach is that there is no exposure for an observer of Customers Table 210′ to the relationship of customer name and customer address because of the RDBMS primary key-foreign key design approach. While this lack of visibility seems trivial in this example, in a database with hundreds of tables the lack of exposure may cause errors in the maintenance of the database. Furthermore, the designer viewing either table sees a “decomposed” view of all the records of one table or all the records of the other table.
For an RDBMS containing millions of records, the problems of expansion and maintenance are only amplified. Traditional RDBMS systems and techniques may be insufficient to produce a manageable and maintainable design for very large databases. Alternatively, for example, SAP's Master Data Management Environment (MDME) system is an integrated system for master data management that uses a SQL (Structured Query Language) DBMS, but does not require designers to use SQL for searching, sorting, and retrieving of information. Standard SQL DBMS do not support the types of advanced structures necessary for managing master databases. Master data systems may consist of a thick shell of functionality on top of a SQL-based DBMS to provide a scalable database where data is fully accessible to other SQL-based applications and tools.
Master data systems simplify maintenance and promote data integrity by simplifying the user's view of the data and its repository. For example, MDME presents the user with exactly one main table. The MDME main table consists of the primary information about each main table record. Variations on common data, such as for example a list of suppliers for various products listed in the database, may be stored as “look-up” list of acceptable values in sub-tables. By requiring main table entries to contain only references to the sub-table, the master database can maintain data integrity.
To save space and speed searches and object references, master databases may store objects, for example, images, text blocks, text HTMLs, and PDFs tables, in an object table. Some master database systems offer designers an object table as a special type of lookup sub-table, where each object table is used to store a single type of object, such as images, text blocks, HTML text blocks, or PDF files. Each object may be defined or imported into the repository only once, and then linked to a main or sub-table field as a lookup into the object table of that type. Thus, object tables eliminate redundant information because each object appears only once in the repository, even if it is linked to multiple records.
By contrast, the MDME system supports a variety of different table types that are specifically suited for the particular requirements of storing, organizing, structuring, classifying, managing, and publishing information in an MDME repository (including efficient support for category-specific attributes, which are inherently non-relational). For example, the MDME system includes qualified sub-tables that may store a set of lookup records with associated qualifiers, that is, “subfields” that apply not to the qualified table record by itself, but rather to each association of a qualified table record with a main table record. MDME, for example, supports multiple simultaneous qualified tables.
Thus, for at least the limitations described above, there is a need for a reusable custom-defined nestable data construct for use in database schema design. Ideally, such a construct would comprise a customizable compound user-define data type. A system embodying such a construct would allow a database designer to define a customizable compound record template that groups together and names a set of fields into a nestable, reusable object or type definition that describes or reflects the relation between the fields, without requiring the type to be instantiated into a table. The construct would not require the designer to redesign or rebuild existing tables when the compound data type is changed, but would instead propagate the change transparently. The construct would be nestable to increase the ability to decompose data relationships and reuse definitions. And finally, the construct would be usable for implementing qualified lookup tables in a nestable and reusable form.
For at least the limitations described above, a system and method for a reusable customdefined nestable compound data type for construction of database objects that would eliminate the need for qualified tables and support automated importation of data would be advantageous.