This invention relates to data processing systems, methods and computer program products, and more particularly to database systems, methods and computer program products.
Database systems, methods and computer program products are widely used for information management. More specifically, database systems, methods and computer program products may be used to reliably manage a large amount of data in a multi-user environment, so that many users can concurrently access the same data. Database systems, methods and computer program products generally include a database that actually stores the data, a database management system and one or more applications that interface with the database management system to provide, for example, user interfaces and other applications.
One widely available database system is the Oracle8i database system that is marketed by Oracle Corporation and that is described, for example, in publications entitled Oracle8i Concepts, Release 8.1.5, February 1999, Part No. A67781-01, 1999, and Oracle8i Administrator""s Guide, Release 8.1.5, February 1999, Part A67772-01, 1999. The disclosures of both of these publications are hereby incorporated herein by reference in their entirety. The design and operation of database systems, methods and computer program products are well known to those having skill in the art, and need not be described further herein.
At a high level, a relational database may be considered as consisting of a set of tables. Each table typically includes a set of columns (fields). Each table also typically includes a xe2x80x9cprimary key.xe2x80x9d A primary key is a column which uniquely identifies each record in a table or a set of its columns whose combined values uniquely identify each record in that table. To speed up query searching, conventionally an index is built on the primary key of each table. An index that is based on a primary key column (or columns) is often called a primary key index, or primary index for short.
A xe2x80x9csuper keyxe2x80x9d is conventionally obtained by combining a column or columns to a primary key. An index built on a super key is typically referred to as a super key index. Because a super key contains a column or proper subset of columns whose values can be used to uniquely identify each record, the values of the super key can also be used to uniquely identify each record.
If a table includes non-primary-key columns, i.e. a column or columns that by themselves collectively cannot uniquely identify records in that table, whose values are used to identify records in other tables by matching the primary key values in the other tables, these columns are typically referred to as a foreign key. Often indices are built on the foreign key columns of a table, and these indices are referred to as foreign key indices.
A table having a foreign key may be considered a xe2x80x9cchild tablexe2x80x9d of the table or tables to which the foreign key points. A table which is pointed to by a foreign key or keys from another table may be considered a xe2x80x9cparent tablexe2x80x9d of the other tables. The validity of a record in a child table with a foreign key or keys typically depends on the existence of records in all of its parent tables to whom foreign key values of a record point. The referential integrity constraints between parent and child tables are enforced by a database management system that typically attempts to enforce the constraint that the foreign key of each child record must point to a parent record.
In relational database design, real world entities are normally modeled by entity tables, i.e., each of these entity table models, or represents, a class of real world entities. Between two entity tables the relationship is often many-to-may, meaning that each record in each of the two entity tables can relate to more than one record in the other entity table. Typically, the maximum granularity of the number of such associations for each given record is not fixed in advance. There are many ways to represent a many-to-many relationship between two entity tables. One conventional way is to define another table, called a xe2x80x9crelationship table,xe2x80x9d whose primary key includes both the primary keys of the two entity tables it relates. One such example is illustrated in FIG. 1, where Table A and Table B are entity tables with primary keys a and b, respectively, and Table C is a relationship table. Table C has a primary key which includes the primary keys a and b of Tables A and B and has a primary index based on those primary keys. Also Table C has foreign keys a and b and, therefore, is considered a child table of both Table A and Table B such that foreign keys of each record in Table C point to a record in Table A and a record in Table B.
For very large database systems (VLDB), for the purpose of improving performance, table partitioning is often used to manage data with sizes in the range of, for example, multi-gigabytes to multi-terabytes. For example, in the service management systems deployed by service providers, very large quantities of quality of service (QOS) statistics may be continuously collected 24 hours a day, 7 days a week, 365 days a year. This very large quantity data may be stored in a VLDB, and managed by a database management system (DBMS). This data typically may be concentrated in very few data tables. To achieve higher performance for data manipulation and query operations, often these tables are partitioned into smaller sub-partitions.
Partitioning is described in the above-cited Oracle8i Concepts publication at Chapter 11, entitled Partitioned Tables and Indexes. Partitioning also is described in the above-cited Oracle8i Administrator""s Guide at Chapter 13, entitled Managing Partitioned Tables and Indexes. As described therein, partitioning can address the problem of supporting very large tables and indexes by allowing the tables to be decomposed into smaller and more manageable pieces called partitions. Partitioned tables or indexes can be divided into a number of pieces, called subpartitions, which have the same logical attributes. For example, all partitions (or subpartitions) in a table share the same column and constraint definitions, and all partitions (or subpartitions) in an index share the same index option. Each partition (or subpartition) is stored in a separate segment, and can have different physical attributes. Usually, a partition scheme requires the use of some partition criteria to determine which partition (or sub-partition) of the table a record belongs. The values of some columns are usually used as partition criteria, where such columns must be included in a super key index of the table. The design and implementation of database partitioning is well known to those having skill in the art and need not be described in further detail herein.
FIG. 2 is an example of a partitioning of the tables of FIG. 1. In order to partition Tables A, B and C, conventionally, another set of columns are added to the primary key and primary index of each of the tables to be used as partition criteria of a partition scheme. As is further seen in FIG. 2, these three sets of columns are illustrated as d, e and f. Following the conventional primary key and index design practice, the key and index are designed to match each other. Thus, the foreign key (a,f) of Table C should match the primary index (a,d) of Table A and the foreign key (a,f) should also match the primary index (b,e) of Table B. However, because of the value discrepancy between d in Table A and e in Table B, in general, the values of column f in Table C can only be used to as a foreign key for one or the other of entries in Table A or Table B. Thus, the column f can be used together with a as foreign key to match (a,d) in Table A or together with b as a foreign key to match (b,d) in Table B, but not both. In other words, in this case, Table C cannot be a child of both Table A and Table B at the same time. Accordingly, in order to maintain the parent-child constraints imposed by the DBMS, one of the foreign key dependency declarations must be dropped. However, for whichever foreign key is dropped, the referential constraint may not be enforced by DBMS and the potential of having phantom children records may exist. The consistency, and therefore the correctness of data, may be compromised.
Since the DBMS may not enforce one of the two referential integrity constraints, such constraints could be explicitly enforced. However, explicitly enforcing the referential integrity constraints may result in increased development cost, increased system complexity, and increased run-time CPU overhead.
Accordingly, improvements are needed in the conventional practice of designing database keys and indices to support partitions and referential integrity constraints for very large tables that may have many-to-many relationships between them.
Embodiments of the present invention provide database systems, methods and/or computer program products that include a database management system and a database having a plurality of partitioned tables. The plurality of partitioned tables include at least two entity tables and a relationship table where the relationship table has a child-to-parent relationship to each of the entity tables. A plurality of keys and indices are defined for the entity tables and the relationship table so that the database management system maintains referential integrity between the relationship table and the entity tables to thereby enforce the child-parent relationship.
In further embodiments of the present invention, the plurality of keys and indices include a primary key of the relationship table includes one or more columns of a corresponding entity table. The columns of the primary key, however, do not include columns which are used only for the purpose of partitioning the entity table. A super key includes the columns of the primary key of a corresponding entity table and one or more columns of the entity table which are used only for the purpose of partitioning the entity table. A super key index of the corresponding entity table is based on the super key.
In additional embodiments of the present invention, the primary key includes a primary key corresponding to the relationship table which includes the primary keys of the entity tables and one or more columns of the relationship table used for partitioning the relationship table. A primary key index of the relationship table is based on the primary key corresponding to the relationship table. Additionally, the entries in the entity tables may have a many-to-many relationship with each other which is defined by entries in the relationship table. Also, the relationship table may further include a plurality of foreign keys, each foreign key corresponding to a primary key of one of the entity tables.
In particular embodiments of the present invention, the entity tables comprises network QOS information.
In further embodiments of the present invention, methods, systems and computer program products are provided for maintaining referential integrity between partitioned tables of a relational database. Primary keys of at least two entity tables of the relational database are defined so as to only include columns of the entity tables other than columns used only for the purpose of partitioning the respective entity tables of the at least two entity tables. Super keys of the entity tables of the relational database are also defined so as to include the respective primary keys and at least one column used only for the purpose of partitioning the respective entity tables. Super key indices are defined for the entity tables based on their respective super keys. A primary key of a relationship table associated with the entity tables is defined based on the combined primary keys of the entity tables and at least one column of the relationship table used for the purpose of partitioning the relationship table. A primary index of the relationship table is defined based on the primary key of the relationship table.
In further embodiments of the present invention, each of the entity tables is partitioned based on its column (or columns) used for the purpose of partitioning that is (or are) include in its (or their) super key (or keys) and super key index (or indices). Additionally, the relationship table may be partitioned based on its column (or columns) used for the purpose of partitioning that is (or are) included in its primary key and primary key index. Furthermore, entries in the entity tables may have a many-to-many relationship defined by entries in the relationship table. Foreign keys of the relationship table are defined to correspond to the primary keys of the entity tables related by the relationship table. The values of the foreign keys thus defined reference to the primary keys in their corresponding entity tables. Referential integrity between the entity tables and relationship table then is enforced by the foreign keys declared in the relationship table where the foreign keys are based on the primary keys declared in the entity tables.
Systems, methods and/or computer program products according to embodiments of the present invention may execute in a mainframe environment, and/or in a client/server environment and/or in distributed database environment. Finally, systems, methods and/or computer program products according to embodiments of the invention a may be used with any other database system that provides for partitioning of tables in a database, including Oracle databases as described above, Sybase, marketed by Sybase, Inc.; Informix, marketed by Informix Software, Inc.; Ingres marketed by Computer Associates International, Inc. and DB2, marketed by IBM Corporation. Improved performance, manageability, availability, configurability, flexibility, scaleability and/or maintainability thereby may be provided.