Data storage systems typically break personal names into multiple parts (i.e., parse the personal names) and store these parts in different fields, which may be labeled with terms such as “given name,” “middle name,” “surname,” etc. Such a parsed name may be referred to as a fielded name, and parts of the name may be referred to as terms. Record retrieval systems then compare members of the same field to each other to determine which names are a match for a query. For example, a search for a database record with the name-related fields “GivenName=Mary”, “Surname=Smith” would compare “Mary” to terms stored in the field named “GivenName” and “Smith” to terms stored in the field named “Surname.”
Fielded names contribute to match failures in searches based on name because there is not always a strict correspondence between the terms used in a name and the fields into which the terms are parsed. This is especially true when names from various linguistic and cultural origins are stored in a system designed around one name model. For example, a typical male name in Saudi Arabia is made up of a person's given name, his father's name, his grandfather's name, and a family or tribal name. Western data storage systems may store names in the following fields: “given name,” “middle name,” and “surname”. In such systems, the given name portion of the Saudi Arabian name corresponds to the given name field found in the Western data storage systems. Other parts of the Saudi Arabian name may be distributed across the available fields in various ways in different data storage systems. When a name search is done, the inconsistent fielding may lead to there being no corresponding name parts within the same fields as those of the query.
Some search systems allow multiple parses of the names to be compared, and then searching on each of the possible parses. For example, “Islam Azam Muhammed Metwali” might be variously represented as “Metwali, Islam Azam Muhammed,” “Muhammed Metwali, Islam Azam,” and “Azam Muhammed Metwali, Islam.” While this strategy may reduce the chance that relevant names will be missed altogether, it also tends to increase the number of false positives returned by a search. For example, “Mohammedi, Islam Baahi” would be allowed by the third parse, even though it is not a variant form of “Islam Azam Muhammed Metwali.” This approach also requires multiple comparisons, which increases search times.
Other systems match on tokens rather than names, then return the full names containing the matching tokens. A token may be described as a space-delimited sequence of characters representing a word in a name. In these other systems, returned names may be sorted for presentation based on various filtering or relevance criteria. The sorting criteria may be based on factors other than token similarity. For example, in one system, a search on “Fernando Gomes” with no further qualifying information returns “Fernando Jose Ferreira Gomes” and “Fernando Gomes da Gama,” as the top two out of twenty matching names, ahead of the exact match “Fernando Gomes,” and then also returns “Paulo Francisco Gomes Fernandez” ahead of “Fernando Luciano Gomes de Mendezes,” even though the latter name is more similar to the query.