Systems and methods for locating information in databases are known. An area in which such systems and methods have recently become quite common and heavily used is in searching for information on the World Wide Web (WWW) and/or on other internet sources.
Typically, an internet user will access a search engine, such as AltaVista or Yahoo through a web page maintained for that purpose by the host of the search engine and will input search data relating to the information sought into the search engine. The search data can, for example, comprise keywords or phrases related to the information sought and boolean operators to further qualify the search. Examples of such search data are, "AZT and Toxicity", wherein AZT is one keyword, Toxicity is another and the `and` is boolean operator requiring both keywords to be present in the information source for it to be considered a match.
Once search data is input, the search engine then consults one or more indices it maintains of web pages or other information sources that match the search data. A listing of the information sources that match the search data, often referred to as "hits", is then displayed to the user, the number of matches usually being limited to some predefined maximum number. These matches are typically ranked, usually according to the number of occurrences of keywords or phrases in the information source. Generally, the information which is displayed to the user for each match comprises a location at which the document can be accessed (a URL for a WWW document) and some minimal additional information such as a document title, etc.
Generally, such search engines provide a skilled user with reasonable results from well defined and/or homogeneous databases or other information sources. For example, the APS U.S. Patent database can be efficiently searched based on the contents of well-defined information fields, such as Patent Number, inventor Name, etc. to locate information sought.
However, while such search engines can generally provide a skilled user with reasonable results from such well defined and/or homogeneous databases, they do suffer from disadvantages. Specifically, when searching databases or information sources which are not homogeneous or well defined, such as the WWW and/or internet, even the best formed search strategy can result in a hundred or more matches, many of which are not useful to the user but which must still be reviewed by the user, to at least some extent, to determine this. Further, such search engines generally require the user to understand and be comfortable with boolean type searches and are limited to this type of search operation.
To enhance the chances that the desired information will in fact be located, a user will often perform the same search on multiple search engines thus exacerbating the number of matches which must be reviewed by the user. The use of more than one search engine can also require the user to redraft his search data to accommodate different search data requirements and/or capabilities of the different search engines. For example, some search engines may only allow keyword-based searches while others may permit searching based upon phrases.
These difficulties often result in the less skilled user not obtaining acceptable search results without multiple and/or recursive search attempts, which has led many users to adopt the interactive search technique commonly referred to as, "surfing the web" which, while often entertaining and/or informative, can be time consuming and may still not locate the desired information.
Natural Language Query (NLQ) systems are also known and are used for a variety of purposes. Generally, a NLQ system accepts a search sentence or phrase in common everyday (natural) language and parses the input sentence or phrase in an attempt to extract meaning from it. For example, a natural language search phrase used with a company's financial database may be "Give me a list of the fourth quarter general ledger expense accounts." This sentence will be processed by the NLQ system to determine the information required by the user which is then retrieved from the financial database as necessary. However, such NLQ systems are computationally expensive to operate as the processing required to determine the meaning of a sentence or phrase is significant. Further, such systems are generally limited in terms of the scope of the information which they can access. For example, a different NLQ system is likely required to correctly process queries relating to a company's financial information than is required to search a medical database of obscure diseases. Also, such NLQ systems generally only produce acceptable results with well defined and/or homogeneous databases.
It is desired to have a meta-search engine which will accept natural language search data to search for information from one or more information sources which need not be homogeneous or well defined, the meta-search engine would identify portions of the matching information which it determines to be relevant to the search data and would display at least those determined portions to the user.