1. Field of Use
The present invention relates to the location of information in a database and, more particularly, to the location of information by the resolution of information identifiers, such as names stored in a database directory with purported names provided by users.
2. Prior Art
A recurring problem in databases, in particular those implemented in computer systems, is the search for and location of specific items of information stored in an entry in the database. Such searches are generally accomplished by constructing a directory of the database, wherein the directory includes an index of database entries. The index in turn will contain keys related to the items stored in the database entries and, for each key, the location in the database of the corresponding entry. A user may then use a key associated with a given item of data in an entry to search the directory's index to find the location of the entry. One of the most common forms of keys used in directory indexes are names having some identifying relationship to the information. For example, an entry comprising a word processing file containing the text for a patent application for a method for resolving names may be assigned the name "nameres". A user may then find the text file by searching the directory for "nameres".
The use of names stored in an index to locate information in a database leads, in itself, to particular problems. A user unfamiliar with a given item of information or with the naming convention, may have difficulty in finding a name in the index and such indexes are often too large to search at random. And even a user familiar with a given database entry and the naming conventions may often forget the exact name of a given entry. The problem is therefore one of providing an directory name search and resolution method which is "friendly" to the user.
The above problems are perhaps most graphically illustrated by databases which comprise directories of information about persons. Such databases may, for example, contain information about all of the persons connected to a large computer or communications network, together with their phone numbers, addresses, network locations, positions, and so forth. This instance of a database is perhaps the most complex, in terms of identifying or locating and resolving names in the directory's index, and is the example which will be most often used in the following discussions of the present invention, although the present invention is not limited to this particular form of database or index naming convention.
One method of naming which has been often used in the prior art is "descriptive names" wherein each name is comprised of a set of "attributes". Each attribute is a piece of information of a particular type concerning an object or entry in the database, and the specific "value" of an attribute in a given instance is determined by that particular information about the corresponding database entry. For example, date of creation of an entry may be an attribute of the directory names, and the value of the attribute would be the date on which the entry was created. A given "name" is thereby specified by one or more attribute value assertions, that is, attribute type/attribute value pairs. In general, the order of the attribute value assertions is not significant and the user is required to provide only as many attribute value assertions as are necessary to unambiguously identify the corresponding data entry. The problem with such an approach is that the user must not only be familiar with the particular attributes selected to form the "names", but must get the values of at least the minimum number of assertions exactly as they appear in the names. That is, using "Katy" as an attribute value will not yield a match if the value stored in the index is "Kate".
"Alias" names, wherein a name may be identified through variants of its attributes, have also been considered as providing some degree of user friendliness. For example, if the directory contains a name having the attribute value assertions "CountryName=US" and "StateOrProvinceName=Massachusetts" then the aliases "CountryName=US,StateOrProvinceName=Mass" and "CountryName=US,StateOrProvinceName=MA" will allow the user to locate the entry having that name.
This method has severe limitations, however, in that a separate alias entry must appear in the directory for every naming variant, and attribute value assertions having many possible variants will require many aliases. This will greatly enlarge both the size of the directory and the administrative overhead to maintain the directory. The problem increases with the number of attribute value assertions comprising a given name. That is, the total number of aliases needed to support a name is the product of the number of variants for each attribute value assertion in the name. In addition, a given attribute value assertion may appear in a number of different names and in association with a variable set of other attributes. For example, "StateOrProvinceName=Massachusetts" may appear with "CountryName=US" in one name and with "Organization=HBI" in another name so that, in this instance, six aliases are required to support the three variants of "StateOrProvinceName=Massachusetts".
The problems discussed above are further aggravated when directory names are to be based upon the use of "common names" as attributes. "Common names" are frequently defined as the "name by which an object is commonly known in some defined and limited scope". Considering, for example, a directory based upon the "common names" of persons in English speaking countries, a personal name may be comprised of a personal title, such as Mr., Mrs., Ms. or Doctor, a first name, one or more middle names, a last name, and generational designations such as Jr. or II. In some English speaking cultures common names may further include titles, decorations and awards. It is apparent that, compared to most other types of attributes which may be used in names in directories, the common names of persons are capable of very large numbers of variants. In order for the directory to be "user friendly" however the directory should be able to resolve requests for directory entries when some or all of the common name attributes are absent, provided only that the components provided are sufficient to identify the entry, and that alternate versions or variants of the components should be accepted.
The magnitude of the problem may be illustrated by an example, the name "Mr. Robin Lachlan McLeod BSc(hons) CEng MIEE" which might be a typical English name If the last name were sufficient to uniquely identify this person, then the directory should resolve the name with all of the title, first name, middle name and awards components omitted, or with certain combinations of these components present and others missing. The directory should resolve the name if the first and middle names are provided as either full names or initials, or when a nickname or its initial is provided in place of the first name. There are thus two variants for each of the titles and awards, three variants for the middle name and five for the first name. Without allowing such variants as a middle name or initial without the first name or initial, there are 52 valid variants. If there were a generational designation in the name, such as Jr. or II, there would be 104 variants, and if the person had two or more middle names, as is common in England, the number of variants would triple for each additional name. Allowing subsets of the awards and initials with and without periods would further increase the allowable variants.
To further illustrate the problem of common name resolution, in certain countries the surname appears as the first element of a person's name. In yet other countries, multi-word first names are relatively common, such as Mary Ann or Billy Bob in the United States, as are multi-word last names such as De Tomaso and Conan Doyle and two component, hyphenated last names. In Spanish speaking countries, a man's legal name is comprised of a first name, multiple middle names, a surname from the father, and the mother's maiden name, which is frequently dropped in common usage.
It is apparent from just this example that a directory which exhaustively catalogues all acceptable variants of a common name attribute, whether as aliases or as additional attribute values, is impracticable as regards both the size of the directory and the efforts required to maintain the directory.
A solution to the above described problems of the prior art, and other problems is provided by the method of the present invention for common name resolution in a database directory.
It is therefore an object of the present system to provide an improved method for locating database entries through entries of their names in an index in a directory and, in particular, to do so within systems wherein the names are personal names and the entries are to be located through user provided common names which may differ significantly from their corresponding forms in the index.