The Information Age brings with it new terms, such as “information overload” and “data overload.” The Internet, and other sources, now provides an almost endless amount of text, or data on virtually any subject. The problem then becomes one of data management: how to organize the data in a meaningful way. Depending upon the requirement, the data may be organized based on any number of different criteria, with the number of different organizational criteria only limited by the number of different requirements.
Conventional methods for organizing data in the form of references usually match the references from multiple sources and then combine them. However, this method results in a data integration problem. This is because conventional matching techniques depend on the existence of a common referenced identifier, such as the name of a person, in the records being matched before using various techniques to determine whether the two records refer to the same entity.
Generally, record linkage techniques assume the existence of common explicit identifiers, particularly names, and the techniques then focus on trying to match one named record in one database to another similarly named record in another database. However, if the different records refer to an implicit entity, these techniques are not effective. For example, references citing the same publication in different records do not have an explicit identifier. Additionally, conventional record linkage techniques are poorly suited for matching records that are derived from conventional information extraction methods.
Therefore, there exists a need for a system and method for organizing data in a reliable and effective manner.