Extraction of entries from databases is generally known and different algorithms have been suggested in the past. The general approach is to associate a key to each entry stored in the database and then retrieve a given entry when an inputted query corresponds to the key associated to the entry. For instance, the entry may be a text description of the Norton Motorcycles company and the key can be “Norton Motorcycles”. When the user enters “Norton Motorcycles” as query, the description stored in the database is returned.
This general approach works well when there is a one-to-one correspondence between the query and the key. For instance, in a database storing order entries of several clients, the key could be a unique number associated to each client. Once a query is entered, corresponding to a unique number of a given client, the query is compared to each key until a corresponding key is identified. Once the corresponding key is identified, the entry associated to the key is extracted from the database and returned to the user. However, this approach does not work well when the key is generally related to the inputted query, but does not correspond in a one-to-one manner. This will be more easily understood by means of the following example.
A database from which the entry is to be extracted may be, for instance, a dictionary, where each entry in the database is associated to a key, for instance a string, and provides a description or explanation of the key. For instance, the key “motorcycle” may be associated to the entry “A moving vehicle with two wheels generally used for transportation of people”.
When using a classic one-to-one correspondence approach, as described above, the user who wants to reach the description of what a “motorcycle” is would have to enter exactly the keyword “motorcycle” as a query. Entering “motorbike” would not return the intended entry. In some cases this can be solved by allowing the use of wildcards. For instance, the query “motor*”, where the character “*” substitutes any given number of characters, would return the entry associated with the key “motorcycle”. The approach using wildcards is however generally not ideal as in most cases it increases drastically the number of identified results, such that the user then has to manually select the relevant result.
A problem moreover arises when a user wants to know, for instance, what the keyword “Norton café racer” means, if the database only comprises an entry associated to the key “Norton motorcycle”. In this case, assuming the user has no knowledge of what the Norton Motorcycle Company manufactures and what a “cafè racer” indicates in the field of motorcycles, it may be difficult for the user to find the relevant entry in the database. It is possible, for instance, that no entry in the database may correspond to the key “Norton café racer”. However, the entry corresponding to “Norton motorcycle” may provide, as a description “A British motorcycle manufacturing company known in particular for manufacturing of sport motorcycles with retro style, known generally as café racers” which could help the user in understanding what “Norton café racer” means. Finding the entry “Norton motorcycle” may be tried by the use of wildcards, for instance by looking for “Norton café*”, which wouldn't retrieve any entry, or “Norton*”, which would. However, the user would have to manually try several possible combinations, which is not efficient. Moreover, it is possible that the database may comprise the following keys: “Norton Motorcycle”, “Norton Edward”, “Norton Antivirus” so that a search for “Norton*” would not yet return the specific entry that the user is looking for, since three different keys correspond to the query. Even further, in some cases, it would be desirable to identify the entry related to the key “Norton Motorcycle” by entering as a query the term “café racer” since, as described above, the term “café racer” is somehow related to the entry associated to the key “Norton motorcycle”.
The present disclosure has been developed to solve one or more of the above-described problems.