The present invention relates to the field of databases and data warehousing, and in particular to a system for searching and retrieving information from catalogs.
The widespread advent of the Internet is well known, with individuals and businesses using the Internet as a new media of exchange. Increasingly electronic commerce, or e-commerce, is becoming widely accepted. In e-commerce applications, sellers often put descriptions of their product on a web site. Further, to better publish a product line, a seller typically arranges products into a xe2x80x9ccatalogxe2x80x9d, intending to facilitate buyers finding the products that the buyers want. Examples of marketing products for building catalog applications are Intershop, iCat and Open Market.
Many buyers are not, however, well versed in how sellers categorize their products (e.g., the particular name associated with a category of goods, etc.). Therefore, contrary to the intended purpose of a catalog, a catalog regularly creates confusion and is difficult to use for ordinary buyers. This can result in frustration on the buyer""s part and loss of revenue on the seller""s part.
The difficulty arises in existing methods due to the B-tree like indexing schemes used in many ordinary databases. The B-tree indexing scheme is used to save space and time in searching and navigating a hierarchical data structure. This can be visualized by considering FIGS. 1A and 1B. FIG. 1A is a block diagram illustrating a table 100 of a relational database. The table 100 has six columns labeled Key, Country, State, City, Name and Income. Each of the keys S45 to S55 is associated with a different person and that person""s data. In addition to the person""s name, their income and residence (country, state, and city) are given in fields in each row. Thus, the first row having the Key S45 is for xe2x80x9cWuxe2x80x9d and has entries of US, CA (California) and LA (Los Angeles) for the country, state and city fields, respectively. The last column xe2x80x9cIncomexe2x80x9d has an entry of 1200. The fields in the three columns Country, State and City are highlighted with thicker borders to point out that the values of the fields are repetitive in terms of information content. For example, the entries xe2x80x9cUSxe2x80x9d and xe2x80x9cCanadaxe2x80x9d appear several times under the Country column.
FIG. 1B is a B-tree representation 150 of the entries in the columns Country, State and City and the corresponding Keys, which seeks to reduce the redundancy of the table 100. The tree 150 contains nine nodes numbered 1 to 9. The first node is for the world. The second and third nodes depending from the first node are US and CANADA (CA). This results in a significant compaction of the data contained in the Country column of table 100. Corresponding to these two nodes are Keys S54 and S52, respectively, since the State and City fields in table 100 are xe2x80x9cNilxe2x80x9d for these two keys. The fourth node CA depends from the second node for the US. Likewise, BC (British Columbia) depends from the third node for Canada. The fourth and fifth nodes have corresponding Keys S46 and S51, respectively, since the corresponding City field entries are xe2x80x9cNilxe2x80x9d for these keys. The sixth and seventh nodes for LA and SF (San Francisco) depend from the fourth node for CA. The sixth node has corresponding Keys S45, S49, and S53. The seventh node has Key S50. Finally, the eighth and ninth nodes VC (Vancouver) and IV depend from the fifth node BC. The eighth node has Keys S47 and S55, and the ninth node has Key S48.
Again, the tree representation 150 of the indices is more compact for storage than the database structure shown in the first four columns of table 100. However, conventional methods are disadvantageous in that they do not specify how indices and keys can be compacted for faster access of keys. Consequently, Keys such as S54 and S52 corresponding to nodes 2 and 3 of FIG. 1B are not contiguously stored; this similarly applies to other subgroups of Keys, such as the group consisting of S45, S49, S53, in relation to Key S50. For example, if Keys S45, S49, and S53 are stored on a disk storage medium at a discontinuous position for S50, a read latency occurs after Key S53 is read to obtain Key S50. This can result in significant degradation of the performance of accessing Keys in the database or catalog. Therefore, due to the randomness in the key/index organization, the speed of retrieval processing is substantially degraded, because (1) each successive traversal along the tree nodes of FIG. 1B and (2) each retrieval of groups of keys incurs latencies for setting up a disk scan due to the storage randomness.
Therefore, a need clearly exists for an improved system of organizing keys and indices to facilitate better retrieval of information from a catalog, especially a system that can use rough and ambiguous user input.
In accordance with a first aspect of the invention, there is disclosed a method of organizing indices and keys of a tree-like data structure for electronic catalog searching and retrieval. The method including the steps of: storing indices according to categories and subcategories in an array of indices, wherein indices of a category or subcategory are stored contiguously in the array, each index having one or more means for linking the index with a subordinate intermediate index or a leaf index, so as to record the interrelationship of the indices in the tree-like data structure; storing the keys according to the categories and subcategories in an array of keys, wherein keys of a given index are stored contiguously with keys of any indices at the same corresponding category or subcategory level and keys of any subordinate subcategories within a category or subcategory are stored hierarchically in the array of keys; and linking each index of the array of indices with one or more corresponding keys of the array of keys corresponding to the category or subcategory associated with the index.
Preferably, the method includes the step of naming each subcategory according to one or more indices corresponding to the category or subcategory to provide a named path. The method may also include the step of searching a category or subcategory of keys using a named path.
Preferably, each index of the array of indices is linked with one or more corresponding keys of the array of keys corresponding to a category or subcategory associated with the index.
Optionally, the method further includes the step of adding at least one alternative view path to the tree-like data structure, including at least one additional node. The method may further include the step of generating at least one equivalent named path for an existing named path of a category or subcategory. The method may also include the step of adding one or more additional indices to the array of indices corresponding to the alternative view path. Still further, the method may further include the step of adding one or more additional keys to the array of indices corresponding to the alternative view path.
In accordance with a second aspect of the invention, there is disclosed an apparatus for organizing indices and keys of a tree-like data structure for electronic catalog searching and retrieval. The apparatus includes: a device for storing indices according to categories and subcategories in an array of indices, wherein indices of a category or subcategory are stored contiguously in the array, each index having one or more means for linking the index with a subordinate intermediate index or a leaf index, so as to record the interrelationship of the indices in the tree-like data structure; a device for storing the keys according to the categories and subcategories in an array of keys, wherein keys of a given index are stored contiguously with keys of any indices at the same corresponding category or subcategory level and keys of any subordinate subcategories within a category or subcategory are stored hierarchically in the array of keys; and a device for linking each index of the array of indices with one or more corresponding keys of the array of keys corresponding to the category or subcategory associated with the index.
In accordance with a third aspect of the invention, there is disclosed a computer program product having a computer readable medium having a computer program recorded therein for organizing indices and keys of a tree-like data structure for electronic catalog searching and retrieval. The computer program product includes: a module for storing indices according to categories and subcategories in an array of indices, wherein indices of a category or subcategory are stored contiguously in the array, each index having one or more means for linking the index with a subordinate intermediate index or a leaf index, so as to record the interrelationship of the indices in the tree-like data structure; a module for storing the keys according to the categories and subcategories in an array of keys, wherein keys of a given index are stored contiguously with keys of any indices at the same corresponding category or subcategory level and keys of any subordinate subcategories within a category or subcategory are stored hierarchically in the array of keys; and a module for linking each index of the array of indices with one or more corresponding keys of the array of keys corresponding to the category or subcategory associated with the index.
In accordance with a fourth aspect of the invention, there is disclosed a method of organizing indices and keys of a tree-like data structure for electronic catalog searching and retrieval. The method includes the steps of: storing indices according to sibling categories and subcategories in an array of indices, wherein indices of a sibling category or subcategory are stored contiguously in the array, each index having one or more means for linking the index with a subordinate intermediate index or a leaf index, so as to record the interrelationship of the indices in the tree-like data structure; storing the keys according to the sibling categories and subcategories in an array of keys, wherein keys of a given index are stored contiguously with keys of any indices at the same corresponding sibling category or subcategory level; linking each index of the array of indices with one or more corresponding keys of the array of keys corresponding to the sibling category or subcategory associated with the index.
Preferably, the method further includes the step of naming each subcategory according to one or more indices corresponding to the sibling category or subcategory to provide a named path.
Preferably, the method further includes the step of searching a sibling category or subcategory of keys using a named path.
Preferably, each index of the array of indices is linked with one or more corresponding keys of the array of keys corresponding to a sibling category or subcategory associated with the index.
Preferably, the method further includes the step of adding at least one alternative view path to the tree-like data structure, including at least one additional node. Further, the method may include the step of generating at least one equivalent named path for an existing named path of a sibling category or subcategory. Still further, the method may include the step of adding one or more additional indices to the array of indices corresponding to the alternative view path. The method may also include the step of adding one or more additional keys to the array of indices corresponding to the alternative view path.
In accordance with a fifth aspect of the invention, there is disclosed an apparatus for organizing indices and keys of a tree-like data structure for electronic catalog searching and retrieval. The apparatus includes: a device for storing indices according to sibling categories and subcategories in an array of indices, wherein indices of a sibling category or subcategory are stored contiguously in the array, each index having one or more means for linking the index with a subordinate intermediate index or a leaf index, so as to record the interrelationship of the indices in the tree-like data structure; a device for storing the keys according to the sibling categories and subcategories in an array of keys, wherein keys of a given index are stored contiguously with keys of any indices at the same corresponding sibling category or subcategory level; and a device for linking each index of the array of indices with one or more corresponding keys of the array of keys corresponding to the sibling category or subcategory associated with the index.
In accordance with a sixth aspect of the invention, there is disclosed a computer program product having a computer readable medium having a computer program recorded therein for organizing indices and keys of a tree-like data structure for electronic catalog searching and retrieval. The computer program product includes: a module for storing indices according to sibling categories and subcategories in an array of indices, wherein indices of a sibling category or subcategory are stored contiguously in the array, each index having one or more means for linking the index with a subordinate intermediate index or a leaf index, so as to record the interrelationship of the indices in the tree-like data structure; a module for storing the keys according to the sibling categories and subcategories in an array of keys, wherein keys of a given index are stored contiguously with-keys of any indices at the same corresponding sibling category or subcategory level; and a module for linking each index of the array of indices with one or more corresponding keys of the array of keys corresponding to the sibling category or subcategory associated with the index.