1. Field of the Invention
This invention relates in general to a computer-implemented relational database management system, and more particularly, to the efficient processing of parent-child relationships in a relational database management system.
2. Description of Related Art
Data in an Relational DataBase Management System (RDBMS) is organized into one or more tables that are comprised of rows and columns of data, wherein the rows are known as tuples and the columns are known as attributes. A database will typically have many tables and each table will typically have multiple tuples and multiple attributes. Users formulate relational operations on the tables, rows, and attributes, either interactively, in batch files, or embedded in host languages, such as C and COBOL.
It is common, in Internet xe2x80x9ctext miningxe2x80x9d and xe2x80x9cbusiness intelligencexe2x80x9d applications, for example, for tuples or attributes to be related to one another by one or more concepts or groups. For example, an Internet search engine works efficiently when a large collection of documents (e.g., web pages) are described and queried by a set of descriptive xe2x80x9cconceptsxe2x80x9d (stemmed keywords after filtering stopwords and/or xe2x80x9ccharacteristicsxe2x80x9d obtained by various means of parsing document content). Concept relationships are a natural way to provide meta-data. In particular, concept relationships often comprise parent-child relationships.
The Internet search engine may use tuples to represent documents and the attributes of those tuples may comprise pointers that define one or more concepts or groups among the documents. It may be necessary in such an application to access the tuples and then ascertain whether or not a tuple is or is not a descendant of another tuple in a particular group. One example is xe2x80x9cdocumentxe2x80x9d ranking of documents or pages in an Internet xe2x80x9ctext miningxe2x80x9d application by updating aggregate counts of various xe2x80x9cconceptsxe2x80x9d derived from the text of a search query.
In such applications, the computation of aggregate functions for each individual concept or group is a basic, often repeated, operation. Often, an aggregate grouping function is used to rank one collection versus another. For example, an Internet search engine may use an aggregate grouping function to rank and/or organize documents or pages found by the Internet search engine.
A. Klug, Access Path in the xe2x80x9cAbexe2x80x9d Statistical Query Facility, Proc. ACM SIGMOD, 1982, pp. 161-172, which is incorporated by reference herein, teaches that special treatment should be given to aggregation of groups. Efficiency for the aggregate grouping operator is addressed in D. J. Haderle, and E. J. Lynch, Evaluation of Column Function on Grouped Data During Data Ordering, IBM Technical Disclosure Bulletin, Mar. 10, 1990, pp. 385-386, which is incorporated by reference herein.
There remains, however, a need in the art for new techniques for representing such relationships and for processing queries relating to these relationships. The present invention provides these needed techniques.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, article of manufacture, and data structure for efficiently identifying parent-child relationships. The parent-child relationships are encoded into a matrix, wherein a particular member is represented by a particular row and a particular column of the matrix. A value at an intersection of a specific one of the rows and a specific one of the columns indicates whether a parent-child relationship exists between the member represented by the row and the member represented by the column. Thereafter, matrix operations may be applied to the matrix.
An object of the invention is to provide an improved system for encoding parent-child relationships. Another object of the present invention is to encode such relationships in a manner that allows various operations to be performed on the encoding. Yet another object of the present invention is to identify multiple levels of ancestors/descendants of a member based on the encoding.