Searching a multiplicity of data records is of great importance, for example in what are known as online shops. A provider of a multiplicity of products records the products provided in a database using data records. A user can then use his computer to set up a connection to the online shop via a network, such as the Internet, and to recall the data records from the database. If the database comprises a very large stock of data, and if the individual data records have a relatively complex structure, then it is necessary for the user to be able to search the data records using a search engine. In this case, the user transmits a search query to the online shop. The online shop or a system connected thereto processes the search query and returns data records ordered in a particular manner to the user as hits. In this context, the problem arises of determining the data records which are particularly relevant to the search query from the user.
Furthermore, it is known practice not only to search the database of an online shop but also to search for data which can be received via the Internet. Search engines of this kind are called Internet search engines.
For all search engines, the problem arises that the search query is frequently vague and subject to uncertainty. The search terms in the search query frequently do not correspond exactly to the terms which occur in the data records that are to be searched. Furthermore, the search terms may contain typing errors or may also be intended to refer to grammatically different forms of the search term. When processing the search query, pains are therefore always taken to factor in the vagueness and uncertainty of the search query.
EP 1 095 326 B1 describes a search system for retrieving information which is stored in the form of text. For retrieving the information, the search system involves the use of a tree structure as a data structure for the text. In addition, a measure is used to determine the degree of match between a query and the retrieved information, wherein the measure determines a combination of one measure of spacing for an approximate degree of match between words or symbols in the text and the query, and of another measure of spacing for an approximate degree of match between sequences of words or symbols in the text and a query sequence.
EP 1 208 465 B1 describes a search engine for searching a collection of documents. In the case of this search engine, data processing units form groups of nodes which are connected in a network. The search engine is customized such that it can be scaled in respect of the data volume and the query rate for search queries.
EP 1 341 009 B1 describes a method for operating an Internet search engine. The method involves links between websites on the Internet being processed by means of an intelligent agent. The contents of the visited websites are filtered in order to determine the relevance of the content. The relevant websites ascertained in this case are indexed, and the indexed, subject-specific information is stored in a database. The filters allow the contents of a website to pass through a subject-specific filter on a dictionary basis, said filter comparing contents of the website with terminology found in the dictionary.
EP 1 459 206 B1 describes a computer-implemented method for searching for a collection of items, each item in the collection having a set of properties. The method involves the receipt of a query which is formed from a first set of two or more properties. A distance function is then applied to one or more of the items in the collection, and a result item or a plurality of result items are identified on the basis of the distance function. In this case, the distance function determines a distance between the query and an item in the collection, specifically on the basis of the number of items in the collection which have all the properties within the intersection between the first set of properties and the set of properties for the item.
Finally, EP 1 622 054 A1, WO 2008/085637 A2 and WO 2008/137395 A1 describe further search methods and search engines for searching data records.
Finally, the publication by Tuan-Quang Nguyen et al.: “Query expansion using augmented terms in an extended Boolean model”, Journal of Computing Science and Engineering Korean Institute of Information Scientists and Engineers South Korea, vol. 2, No 1, March 2008 (2008-03), pages 26-43, ISSN: 1976-4677, discloses a search method which involves the original search query first of all being expanded by terms which are selected from a thesaurus, for example. The selection of these added terms factors in the similarity to the original search term. Finally, yet further terms (augmented terms) are added which factor in the joint occurrence of the search terms in the documents. The terms in the search query that has been expanded in this manner are then provided with weightings, the original search term being provided with the weighting 1 and the added terms being provided with a weighting which is dependent on the similarity to the original search term. Disadvantageously, however, the method in this document does not solve the problem, inter alia, that incorrectly spelled words in a search query result in a very high level of relevance for the incorrectly spelled word, since incorrectly spelled words do not occur in documents at all or occur rarely.