Data about almost anything, such as people, products, or parts may be stored in digital format in a data source such as a computer database. These computer databases permit this data to be accessed rapidly and may permit the data to be cross-referenced to other relevant pieces of data within the database. The databases also permit a person to query the database to find data records pertaining to a particular search criteria. A database, however, has several limitations which may limit the ability of a person to find the correct data within the database. The actual data within the database is only as accurate as the person who entered the data. Thus, a mistake in the entry of the data into the database may cause a person looking for data in the database to miss some relevant data because, for example, a last name of a person was misspelled. Another kind of mistake involves creating a new separate record for something that already has a record within the database (e.g. duplicative records, where the data records may have one or more different attributes). Furthermore, several data records may contain information about the same thing, but, for example, the names or identification numbers contained in the two data records may be different so that the database may not be able to associate the two data records with one another.
For a business that operates one or more databases containing a large number of data records, the ability to locate relevant information about a particular thing within and among the respective databases is very important, but not easily obtained. Once again, any mistake in the entry of data (including without limitation the creation of more than one data record for the same thing) at any information source may cause relevant data to be missed when the data for a particular thing is searched for in the database. In addition, in cases involving multiple information sources, each of the information sources may have slightly different data syntax or formats which may further complicate the process of finding data among the databases. An example of the need to properly identify something referred to in a data record and to locate all relevant data records in the health care field is one in which a number of different hospitals associated with a particular health care organization may have one or more information sources containing information about their patient, and a health care organization collects the information from each of the hospitals into a master database. It may be desired to link data records from all of the information sources pertaining to the same patient to enable searching for information for a particular patient in all of the hospital records.
There are several problems which limit the ability to find relevant data in such a database. Multiple data records may exist for a particular thing as a result of separate data records received from one or more information sources, which leads to a problem that can be called data fragmentation. In the case of data fragmentation, a query of the master database may not retrieve all of the relevant information about a particular thing. In addition, as described above, the query may miss some relevant information due to a typographical error made during data entry, which leads to the problem of data inaccessibility. In addition, a large database may contain data records which appear to be identical, such as a plurality of records for people with the last name of Smith and the first name of Jim. A query of the database will retrieve all of these data records and a person who made the query to the database may often choose, at random, one of the data records retrieved which may be the wrong data record. The person may not often typically attempt to determine which of the records is appropriate. This can lead to the wrong data records being retrieved even when the correct data records are available. These problems limit the ability to locate desired information for about a particular thing within the database.
For a variety of reasons it may also be desirable to associate various data records within these various information sources. For example, to reduce the amount of data that must be reviewed and prevent the user from picking the wrong data record, it is also desirable to identify and associate data records from the various information sources that may contain information about the same thing. There are conventional systems that locate duplicate data records within a database and delete those duplicate data records, but these systems only locate data records which are identical to each other. Thus, these conventional systems cannot determine if two data records, with for example slightly different last names, nevertheless contain information about the same entity. In addition, these conventional systems do not attempt to index data records from a plurality of different information sources, locate data records within the one or more information sources containing information about the same entity, and link those data records together.
Additionally, it may be desired to associate or group data records within various information sources where the various data records pertain to a particular logical or physical thing. For example, different family members may each have distinct data records yet it may still be desirable to associate these various distinct data records with one another such that the grouping of data records represents a household. Another example may be the association of various distinct data records together to represent a division within a business, etc. In other words, it may be desired to group distinct data records together according to almost any logical pr physical group or thing.
Similarly, data records in an information source may relate to one another in a variety of manners and it may be desired to associate data records within multiple information sources in a manner which expresses relationships between those things on which the data records contain information. For example, one data record may comprise information on an employer while another data record may comprise information on an employee, it may thus be desired to associate the two data records in a manner which expresses this employer/employee relationship. Similarly, one data record may comprise information on a parent corporation and another data record may comprise information on a subsidiary corporation. Here as well it may be desired to associate the two data records in a manner which expresses this parent/subsidiary relationship between the corporations on which the data records comprise information.
Thus, as can be seen from the above discussion, in many cases it may be desired to associate data records for a variety of reasons and purposes to allow a user to better manipulate, organize, filter or otherwise process data in a variety of data sources. As may also be discerned the manipulation, organization or processing or viewing of such a large amount of data may be somewhat problematic, especially once data records have been grouped to represent various things and the relationships between these various things have also been represented. Thus, not only is it desired to be able to associate these data records in an arbitrarily complex manner to represent various things and the relationships amongst these things, but it is also desired to have an interface which allows the management, manipulation or visualization of such data records and associations.