The invention relates to searching and navigating databases and other information sources and, more particularly, to a system and method using category intersection pre-computation to facilitate search and navigation.
An ever increasing amount of information is becoming available electronically, particularly through wide-area networks such as the Internet. The Internet and its various document collections as found in USENET, the World Wide Web, and various FTP and similar sites, is perhaps the largest collection of full-text information available. Already, tens of millions of documents are available in various document databases on the Internet. Performing rapid searches for information on the Internet already requires expensive, high performance computers with vast quantities of RAM and fast disk drives. Even worse, the Internet is rapidly growing. Some estimates claim that the amount of information available on the Internet doubles every four months. Effective computer performance doubles only every 18 to 24 months, and the cost per megabyte of storage improves even more slowly.
Based on these estimates, it""s no wonder that online searching of large databases via the Internet can be costly and time consuming. Indeed, Internet users sometimes have to wait several minutes for there searches to complete, thus consuming large amounts of costly connect time. In addition, users often need to repeatedly narrow, expand, or refocus their searches, which can result in unnecessary or redundant searches through a database.
Various factors can influence the results provided by database search engines. Some of these factors include the size of the database searched, frequency of updates of the database, search capability and design, and speed. For example, many conventional search engines use databases that organize information into broad subject category hierarchies which makes it difficult for users to quickly narrow, expand, or refocus their search across category hierarchies. For example, conventional search engines typically do not allow users to refocus their search from one category hierarchy to another without losing previous search and navigation results. Rather, these search engines often force users to restart the search and navigation process at the top level of the new category hierarchy to be searched, thereby losing any previous search results. Thus, these conventional systems and methods can add considerable delay to the search process.
Accordingly, there is a need for a system and method for quickly searching databases and other information sources. Such a system and method should allow users to search and navigate across category hierarchies without losing results obtained from previous searches.
The present invention is directed to a computer-implemented search and navigation system and method using category intersection pre-computation. Generally, intersection pre-computation is the pre-determination, prior to query processing, of a large number of intersections or combinations of different terms and categories, and the documents that are relevant to such intersections. These intersections (hereinafter also referred to as xe2x80x9creport keysxe2x80x9d) are generated for each document in a database having a plurality of documents. The report keys contain information that allows a user to navigate between category hierarchies while maintaining previous search results.
More particularly, each document in the database is scanned for a plurality of index terms. The index terms are combined with predefined top level category descriptors to form report keys. Each report key further includes a pointer to the memory address of a bit-map corresponding to the lowest subcategory descriptor in a category hierarchy. The report keys generated from the documents are combined into an intersection list. The intersection list is sorted according to a pre-determined sort criteria. A count of the number of same report keys is determined from the sorted intersection list and used to update same report keys using, for example, negative hexadecimal numbers. Redundant report keys are deleted from the intersection list to produce a smaller intersection list. The smaller intersection list is resorted to arrange the report keys according to a predetermined order based on the updated count in each report key.
In one embodiment of the present invention, the user selects a target company and a top level category to define the scope of the search. In response to the user""s selections, a pre-computed intersection list is traversed to identify all report keys falling within the defined scope of the search. The identified report keys are formatted and displayed to the user. Preferably, the display includes one or more subcategory descriptors, and a count of the number of documents that fall within each subcategory. The document counts or xe2x80x9chitsxe2x80x9d enable the user to determine which subcategories will provide the most fruitful search.
The user selects one of the subcategories from the formatted display to further narrow the scope of the search. In response to the user""s selection, the intersection list is used to determine the memory address of the bit-map linked to the selected subcategory. The bit-map is retrieved and logically xe2x80x9cANDxe2x80x9d with term bit-maps corresponding to the target company and top level category, respectively, to produce a first result bit-map. The first result bit-map is used to retrieve document information from the database.
Alternatively, the user can refocus the search by selecting a different top level category by clicking on a tool bar presented to the user as part of the formatted display. In response to the user""s selection, the intersection list is again traversed and the report keys falling within the defined scope of the target company and the new top level category are identified. The bit-maps linked to these report keys are each logically xe2x80x9cANDxe2x80x9d with the first result bit-map to produce a second result bit-map. The second result bit-map is used to retrieve document information from the database.
The present invention provides an advantage over conventional systems by using a pre-computed intersection list. The intersection list enables users to combine category searches with text searches during runtime. Further, the intersection lists enables user to easily access related information between category hierarchies without adding considerable delay to the search by performing redundant searches via top level categories.