As set forth in commonly assigned U.S. Pat. Nos. 5,245,337, 5,293,164, and 5,592,667, a multi-dimensional approach has been developed for transforming an unstructured information system into a structured information system. This approach addresses the unique properties of multiple information source systems, including database systems, from an information point of view. In particular, this new methodology attempts to unify the two fields of information theory and database by combining the encoding compression theory and the database theory for general data manipulations into a general information manipulation theory with respect to multi-dimensional information space.
Broadly, multiple information sources are described by different information variables, each corresponding to one information source or information stream. Information manipulations are primarily index manipulations which are, in general, more efficient than non-index manipulations of the same number. The only non-index manipulations are carried out at leaf nodes where unique data values are stored. Therefore, the non-index manipulations are minimized. As a further aspect of this approach, a structured information system or database is built by taking into account information relations between different sets of data in the database. Such relations between neighboring nodes are easily analyzed and presented on-line because they are built into the structure. Relations between nodes that are not neighbors are not explicitly built into the existing structure. On-line analysis on these relations requires efficient information manipulations in main memory.
However, only a limited amount of such statistical information is explicitly built into the structured database information system. For example, information about double patterns made up of two leaf nodes of single patterns is easily shown in the case of two neighboring leaf nodes which have a common parent node of double patterns in an existing tree structure. Analyzing pattern statistics between any two leaf nodes that do not have a common double-pattern parent node in the existing tree structure could be difficult because the statistics is not explicitly stored at any node in the existing structure. Such analysis would be greatly simplified if one could build a double-pattern node in main memory, that has the exact same properties as if it were built from the raw data. In order to build this node efficiently, no data values should be involved, that is, using only manipulations of the memory tokens.