1. Field of the Invention
This invention relates in general to database content management systems performed by computers, and in particular to a method and system for highly efficient processing, storing, searching and matching of complex nested objects, for their quick and easy retrieval.
2. Description of Related Art
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. RDBMS software using a Structured Query Language (SQL) interface is well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Organization (ANSI) and the International Standards Organization (ISO).
A typical database management system includes both database files and index files. The database files store data in the rows and columns of tables stored on data pages. In such a table, the rows may correspond to individual records while the columns of the table represent attributes of the records. For example, in a customer information table of a database management system, each row might represent a different customer while each column represents different attributes of the customers, such as the name of each customer, the amount owed by each customer and the cash receipts received from each customer.
Instead of providing for direct sorting and searching of the records in the tables, the database management system relies on the index files which contain information or pointers about the location of the records in the tables stored in the database files. The index file can be searched and sorted (scanned) much more rapidly than can the database files. An index file is scanned through transactions in which criteria are stipulated for selecting records from a table. These criteria include keys which are the attributes by which the database finds the desired record or records using the index. The actions of a transaction that cause changes to recoverable data objects are recorded in a log. In database management systems all data are stored in tables on a set of data pages that are separate from the index file. A table can have one or more indexes defined on it, each of which is an ordering of keys of the row of the tables and is used to access certain rows when the keys are known.
Large database archives, such as the ones used in audio and video libraries of media and other communications industries and educational institutions, depend on content management systems and their media indexing applications to create accurate indexes in order to locate and manage the archived content. Content management systems typically manage information contained in a series of complex objects where each object may be composed of references to data elements in various database tables. Thus, data objects are aggregations of database information and, since many data objects are complex nested objects, proper indexing is critical for efficient search and management of these objects in large archives or content collections. A complex object is composed of a series of nested complex objects. When a request for such an object is received, searching through database tables or traversing data structures, such as object trees, may take many iterations. A presently-available conventional method sequentially scans a list of objects and performs sequential comparison on each element of the complex object, in a pool containing a set of such complex objects, in search for a match. This sequential comparison is performed on complex nested objects level by level, i.e., subset by subset, attribute by attribute, element by element, and for each of these nesting level elements a comparison is performed with a corresponding element of the target object. Retrieving the information associated with a complex object may require numerous database accesses to populate all fields of all elements of the nested objects within this complex object. Not only is it time consuming to retrieve the data element information from a database, but the time to compare retrieved object information with desired object information can be a very lengthy iterative process. When these elements are strings, because string comparisons are resource intensive, sequential searches are highly inefficient, especially when searching through a large list of data and repeating comparisons on each level. Therefore, this method of finding a complex nested object in a large pool can be prohibitively time consuming.
Other conventional methods of searching for an object include use of hashing methods whereby an object, such as a rule, is hashed to an integer value which determines its identification and, sometimes, its uniqueness. Next, a search routine looks for the object in a database table or a classification tree at a position determined by an index based on the integer hash value. For complex objects, such as sets, which may have many subsets and elements in each subset, the search process takes quite a long time because the integer hash value may not be unique and thus several objects with the same index may have to be searched sequentially, element by element, for the one that is a total match and having all the same elements.
Therefore, there is a need for a simple, optimized and generic method and system which can improve the manner of processing, storing, searching and matching of complex nested objects, for their quick and easy retrieval. The method should be capable of operating on any type of complex nested objects, with any level of nesting, and performing with the minimum number of string operations, thus minimizing utilization of system resources.