The present invention is a computerized system and method for searching through and retrieving information from an informational resource; and more particularly, the present invention is an information management, retrieval and display system for searching through an informational resource and for displaying the results of the search in an collapsible/expandable format based upon a user-selected display criteria or hierarchy.
An inherent drawback in many conventional search engines or search tools, such as Infoseek(trademark), AltaVista(trademark), Hotbot(trademark), is that the results of the search are typically organized according to the number of hits that the search word or phrase made in each document (Web page) being searched. This type of search result display requires the end user to go through the hits one by one in order to finally access the document he/she was looking for. Another drawback with such conventional search engines is that the results of the search do not take into account that a word may have several different meanings, and may be used in many different contexts. For example, if an end user were looking for information on a cartoon mouse, because the search query would contain the word xe2x80x9cmouse,xe2x80x9d the list of hits will include documents for the electronic cursor-control devices, documents providing biological information on mice, documents providing pet information on mice, etc. Therefore, the end user may have to go through an enormous number of these hits before finally (if ever) reaching a hit related to the cartoon mouse.
Thus, there is a need for a search engine or search tool that allows the end user to that arranges the search results in a manner that allows the end user to effectively and quickly obtain items of interest.
The present invention is an information management, retrieval and display system for searching through an informational resource, such as a document (e.g., a treaty), a number of individual documents (e.g., Web pages resident on the Internet), or a stream of information (e.g., DNA code, source code, satellite data transmissions, etc.) and for displaying the results of the search in an collapsible/expandable format based upon a user-selected display criteria or hierarchy. Such a display hierarchy will allow the end-user to effectively and quickly obtain items of interest from the search results. The type or format of the informational resource is not critical.
The invention includes four primary modules, a break module, an indexing module, a search module and an un-break module. The break module is an expert system operating upon a set of expert rules that define its operation. The break module parses through the informational resource to break up the informational resource into finite elements (such as paragraphs, sections, sub-sections, segments etc.). The break module also creates categorical tags for each of these finite elements, where the categorical tags assigned to each of the finite elements are based upon and analysis (defined by the set of expert system rules) of the contents of each of the finite elements. The categorical tag can include a standard classification such as, for example, xe2x80x9cDewey Decimal-typexe2x80x9d number. The categorical tag can also include an organizational attribute (such as pertaining to the type or location of the finite element with respect to the rest of the rest of the informational resource), a date-stamp, a categorical word, etc. Preferably, the categorical tags are inserted into the finite element.
The index module parses through the finite elements identified/created/processed by the break module and creates a searchable database having a database record for each of the finite elements identified by the break module. The searchable database is a type of reverse index, where each record includes an address or location of the corresponding finite element (and, in turn, the categorical tag included therewith), and strings (such as words, phrases, etc.) contained in the finite element and their frequency (i.e., their weight) within the finite element.
In applications where the users of the invention do not have control of the information being searched (i.e., Web pages on the Internet), each database record may also include the categorical tag, since the categorical tag will not be able to be inserted by the break module into the finite elements themselves. Furthermore, with the Web search application, it may not be necessary to utilize the break and un-break modules at all since each Web-page or link might be considered a finite-element for the purposes of the present invention.
Once the reverse index is created, a search of the reverse index may be performed. Key strings (such as key words, phrases or symbol segments) may be supplied by an end user as a search query, and a display hierarchy or criteria may also be selected or defined by the user. The selected display criteria will instruct the search module how to manipulate the data of the search results. Specifically, the selected display criteria will define if the search results are to be displayed in an order or structure based entirely upon the information contained within the categorical tags (research-centric), if the search results are to be displayed in an order depending entirely upon the frequency of the key strings present in the finite elements (conventional), or if the search results are to be displayed in an order or structure based upon a combination of the two (document-centric).
The search module accesses the search query and searches through the reverse index for database records matching the specific search term or query. The search results are then displayed in an collapsible/expandable (tree) structure by applying the information in the categorical tags for each of the finite elements satisfying the search criteria to the selected display hierarchy. For example, if the selected hierarchy is a document-centric hierarchy, a first level of the display hierarchy may be, for example, the year in which the finite element was created; a second level of the display hierarchy may be, for example, the order in which the finite elements appear in the document; and a third level of the display hierarchy may be, for example, based upon the frequency in which the search words appear in each of the finite elements. The operation of the search module, as with the break and index modules, is based upon a set of expert rules. Therefore, if the search results are not satisfactory, the expert rules in the break, index and/or search modules are modified and the procedure is performed again.
Once one of the finite elements in the search result display are selected by the end-user, the un-break module allows the end user to view a contiguous portion of the informational resource that the selected finite element belongs to. The un-break module will assemble selected finite element with other related finite element to reconstruct the contiguous portion of the informational resource. The un-break module refers to the categorical tag of the selected finite element for information related to the location of the finite element with respect to the entire informational resource, and will then build a portion of the informational resource from all of the finite elements belong to that portion. For example, if the selected finite element is a paragraph of a document, the un-break module may be configured to rebuild the chapter of the document to which the paragraph belongs. As with the other modules of the present invention, the operation of the un-break module is controlled by a set of expert rules that may be modified if the results are unsatisfactory.
It is envisioned that the rule sets will be created and refined by an expert on the document or information being searched. For example, if the system of the present invention is to be associated with Volume 37 of the Code of Federal Regulations, an individual (or group of individuals) with intimate knowledge of the Volume would be best suited to generate and fine-tune the rule sets. The fine-tuning of the rule sets would involve the individual continuously performing example searches on the Volume using the rule sets, and continuously modifying the rule sets to obtain the search results with the desired content and format. Once the rule sets have been fine-tuned, the search module of the present invention can be packaged along with the Volume and sold or distributed as a searchable Volume. Likewise, the search module could operate on a Web-site so that users can access the Web-site and perform searches on the Volume. Since the rule sets have already been defined and fine-tuned by the xe2x80x9cexperts,xe2x80x9d the users would have a fully operable search engine that performs searches and displays results in accordance with an expert""s intimate knowledge with the Volume.
As mentioned above, it is also envisioned that an embodiment of the invention is designed to search through a number of individual Web pages resident on the Internet and to display the results of the search in an collapsible/expandable format based upon a user-selected display criteria or hierarchy. In such an embodiment, a break module in the form described above may not be necessary because each Web page may already be considered to be a xe2x80x9cfinite elementxe2x80x9d and the search engine will not be able to modify the Web pages. Accordingly, in such an application, the index module will parse through each of the Web pages (finite elements) to create a searchable database having a record for each of the Web pages. Each record in the searchable database will include the Web address of the Web page, non-common words contained in the Web page along with their frequency (weight), and a categorical tag, as described above, which includes data based upon an analysis of the contents of the Web page. The index module will also review each of the Web pages to determine if the creator of the Web page had embedded a categorical tag into the Web page itself; and if such an embedded categorical tag is found, the index module may simply insert this pre-defined categorical tag into the database record rather than creating one itself. Therefore, as the present invention becomes more prevalently used on the Internet, Web page creators may desire to create their own categorical tags for their Web pages rather than having the search engine create one for them. With this feature, the Web page designer may be able to influence the search results, perhaps to achieve a more accurate depiction of the Web site. Of course, such a feature may also be used by the Web designers in a deceptive manner, where the categorical tag will cause the Web page to be listed in search results when the searcher is looking for an entirely different type of information. Recognizing this potential problem, the index module will include an option where it will compare the actual contents of the Web page against the embedded categorical tags, and will create a new categorical tag to be inserted into the database record if there is a significant difference. Likewise, the index module can be configured to filter out Web sites having undesirable or unsavory content as indicated by the embedded categorical tags or as determined by a review of the content of the Web page itself.
In another embodiment of the invention, the dynamic expert rule sets may be configured to accept and index all manner of static and dynamic information (such as news-feeds, data transmissions, etc.) on a global scale where an end-user will be able to efficiently and quickly obtain any sort of information he/she wishes from a hierarchal search result display based upon a categorical organization scheme such as the Dewey-Decimal system.
Thus, in one aspect of the present invention, a method for retrieving information from an informational resource comprises the steps of: (a) dividing the informational resource into a plurality of finite elements; (b) assigning a categorical tag to each of the plurality of finite elements, where the categorical tag includes data pertaining to a content of the finite element; (c) generating a searchable database record for each of the plurality of finite elements, where each searchable database record includes at least one string contained within the finite element, where the string can be a word, a phrase, a symbol, a group of symbols, a data segment or the like; (d) supplying a search string; (e) searching the searchable database for searchable database records containing the search string; (f) arranging the results of the searching step in a hierarchal structure according, at least in part, to the data in the categorical tags assigned to the finite elements found in the searching step; and (g) displaying the results of the searching step in the hierarchal structure.
The informational resource may be a single document, a plurality of documents or a stream of data, and the step of identifying the finite elements may include the steps of identifying sections or sub-sections within the document(s) or data stream or by simply identifying the documents themselves. The step of dividing the informational resource into a plurality of finite elements is preferably performed by an expert system according to a rule set; and the step of assigning a categorical tag to each of the plurality of finite elements is also preferably performed by an expert system according to another rule set. If unsatisfactory results are obtained in step (g) above, one or both of the rule sets may be modified by the end user and the steps (a) through (g) may be performed again.
Each database record preferably includes an address or pointer to the corresponding finite element and further preferably includes all of the non-common strings (e.g., words or phrases) contained within the corresponding finite element along with the frequency that such strings appear.
In another aspect of the present invention, a method for retrieving information from an informational resource includes the steps of: defining a first rule set for dividing the informational resource into a plurality of finite elements; utilizing the first rule set, dividing the informational resource into a plurality of finite elements; defining a second rule set for creating a categorical tag for one of the plurality of finite elements; utilizing the second rule set to create a categorical tag for each of the plurality of finite elements; generating a searchable database including a searchable database record for each of the finite elements; searching the searchable database for relevant database records; associating the relevant database records found in the search with corresponding relevant finite elements; selecting a hierarchy for displaying identifying phrases pertaining to the relevant finite elements; ordering the relevant finite elements in the hierarchy according, at least in part, to the categorical tag for each of the finite elements; and displaying the identifying phrases pertaining to the relevant finite elements according to the results of the ordering step.
In another aspect of the present invention, a data storage device (such as a CD ROM) is provided, which comprises: an informational resource divided into a plurality of finite elements, where each of the finite elements includes a categorical tag and a database record assigned thereto, where the categorical tag includes data pertaining to a content of the finite element and the database record includes at least one string contained within the finite element; and also comprises software instructions programmed to retrieve and display at least a portion of the informational resource. The software instructions are configured to perform the steps of supplying a search string, searching through the database records for relevant database records containing the search string, arranging the results of the searching step in a hierarchal structure according to the information in the categorical tags assigned to the finite elements corresponding to the relevant database records, and displaying identifying phrases for the finite elements corresponding to the relevant database records in the hierarchal structure.