1. Field of the Invention
The present invention relates generally to information retrieval, and more particularly to a computer program for information retrieval (e.g., in databases or in the Web).
2. Description of the Related Art
Recently there has been a tremendous rise in the use of the Internet related activities. Many databases are now connected to the xe2x80x9coutside worldxe2x80x9d using internet technology, which allows users to search databases via the Internet and/or a company intranet. More and more users are using the Internet for educational, commercial or personal needs. Several browsers have been developed to xe2x80x9csurfxe2x80x9d the Internet, and many search engines are now accessible through the Internet that assist users to search databases.
Although, users may search databases with these search engines, there are many disadvantages associated with them. Most conventional search engines are not xe2x80x9cuser friendly.xe2x80x9d For example, they do not accept queries (search requests) in a natural language form. Most search engines require users to formulate search words with Boolean operators. Thus, users unfamiliar with boolean operators experience difficulties using these search engines.
Also, most search engines provide results only if there is an exact match between the user formulated search words and the content in the database. Most search engines do not consider synonyms and other approximations of the search words. Thus, if the user does not use the xe2x80x9crightxe2x80x9d word in the query, it is likely that the search engine will fail to find a relevant answer for the user.
Furthermore, most search engines are not capable of processing misspelled queries or queries having syntax errors. Thus a user who made a spelling or a syntax error in the query may not be able to find an answer.
Moreover, most search engines do not provide answers that are user specific or personalized. For example, if a butcher, a stockbroker, and a boxer each include the word xe2x80x9cpoundxe2x80x9d in a search request, they may not be referring to the same object. Since the word xe2x80x9cpoundxe2x80x9d may have different meaning depending on the context, most search engines will not be able to correctly process the search request for all three users. Thus, most search engines may provide a correct answer to the butcher, but may provide an incorrect answer to the stockbroker and the boxer.
Also, most search engines are rigid in that their knowledge database does not evolve through use. Most search engines do not extract information from prior search sessions to upgrade their own vocabulary and knowledge databases. Also, most search engines require an extensive dictionary to operate.
For these reasons, it has been recognized that there is a need for an interface for a search engine that is user friendly and accepts natural language queries. Also, there is a need for an interface that can process misspelled queries and queries having syntax errors. Moreover, there is a need for an interface that allows a search engine to provide user specific or personalized answers. Furthermore, there is a need for an interface that allows a search engine to extract information from prior search sessions and upgrade its own vocabulary and knowledge database.
The present invention is a system and method for searching information from a database (structured or unstructured), using a natural language. In one embodiment, a method for searching a database (also referred to as a target database) using a natural language comprises the steps of receiving a user formulated search request in the natural language, and converting the search request into a list of search words. The list of search words includes most restrictive search words having relevant words from the search request, and includes additional search words created by various approximations of the relevant words from the search request. The search words are converted into a string of bytes, and a datasoup (a subset of the target database) is searched with the string of bytes. If a match exists in the datasoup, the results are retrieved from the target database and provided to the user.
The method additionally comprises the steps of creating a preference file for a user, and storing information about the user in the preference file. The information includes information related to the user""s identification, the user""s own vocabulary, use of synonyms, common spelling errors, and unique writing style. The stored information is retrieved from the preference file to analyze the search request.
The method additionally comprises the step of accessing a system database to analyze a search request, the system database storing global rules, one or more preference files, and one or more dictionaries, although the latter are not mandatory. The method additionally comprises the steps of identifying and extracting essential words from the search request in order to generate most restrictive search words, and generating the additional search words from the essential words using synonyms, phonetically similar words, and spelling corrections.
In one embodiment, a system for searching a target database using a natural language comprises a system server configured to receive a search request formulated by a user in the natural language, a core engine coupled to the system server and to the target database, the core engine processing the search request, and a system database coupled to the core engine, the core engine accessing the system database to analyze the search requests. The system database stores global rules, one or more preference files, and one or more natural language dictionaries. The preference file stores information about the user, including personal information related to the user, and information regarding the user""s own natural language vocabulary, use of synonyms, common spelling errors, and unique writing style.
In one embodiment, the core engine comprises a Master Engine (ME) configured to process the user formulated search request, and a Meta Engine Transcription Automata (META) coupled to the ME and configured to post-process data in the ME during off-line, the META providing rules to the ME regarding the processing of the search requests, construction of a knowledge database in the system database, and searches within the knowledge database.
The system further comprises a User Language Automata (ULA) coupled to the ME and the META, the ULA configured to analyze language in the preference files, the ULA retrieving user specific information from the preference files and providing the information to the ME, wherein the information is subsequently processed by the META during post-analysis.