The present invention relates to information retrieval, and in particular, to a method of retrieving information with similarity search capability.
The explosive growth and prevalence of computer technologies, data storage capacities and the Internet have led to an ever-increasing amount of information being stored in electronic form. As the amount of available electronic information grows, individuals and organizations have sought to put this information to a productive use. For example, many companies with large information bases may seek to use electronic information to improve the way the business is managed. The information stored in a companies data storage facilities may represent years of the companies experiences in a business environment, and such information may be useful in helping to solve new problems that the business faces as such problems arise. Furthermore, a company's information base may include historical or real-time data about the company's financial performance, research and development activities, or manufacturing activities to name just a few. A variety of systems exist for managing such information, but such systems must first be able to access useful information in a way that is meaningful.
While the availability of large volumes of information creates the potential for improved problem solving and decision-making, it is difficult to find and retrieve specific information for a particular use from the large volumes of information available. Thus, the ever-increasing volumes of information, together with the desire to make use of the information, have created a need for methods of retrieving specific pieces of information from a large information base. Information retrieval methods also act as the inputs to information management systems. Without an effective and efficient method of retrieving useful information, such systems cannot be used to their full potential.
Information retrieval methods for structured information typically require a user to specify what information is desired. This is usually done by allowing a user to specify a search request (i.e., a query). Traditional information retrieval methods have focused on finding and retrieving data that is an exact match of a query (e.g., retrieve the account information for a person with the name “John”). However, many useful pieces of information may not be recovered if an exact match is required. Thus, the notion of “similarity” searching has been attracting more attention as a method of retrieving information.
In a similarity search, information may be retrieved even though it does not exactly match the information specified in the search request (i.e., the retrieved data is either the same as or “similar” to a search criterion). Similarity searches typically perform a search across a broader range of information than traditional searches, and return information that differs from the search request in a variety of different ways. For example, similarity searching typically includes numerous processing steps for determining whether or not data that is not an exact match is nevertheless “similar” enough to the search criteria to warrant retrieval. However, because similarity searching typically requires processing steps other than a direct comparison of the search criteria to the data, such searches can be extremely inefficient, computationally intensive and slow. Moreover, the accuracy of the results of a similarity search may depend heavily on the methods employed to determine whether or not data is “similar.” If the processing steps used in the similarity determination are not accurate, the search results will be meaningless, and the search may return information that is essentially useless. Examples may include scenarios where the information returned is completely unrelated to the problem at hand, or where the search returns too few or far too many results.
Thus, there is a need for a method of retrieving data in a way that will improve the efficiency, speed and accuracy of information retrieval over existing techniques. The present invention solves these and other problems by providing an information retrieval method with efficient similarity search capability.