Numerous software applications in areas such as, but not limited to, public health, e-commerce, finance, and national security, link person-level or other entity level data, also referred to as records, to draw conclusions about people or effectiveness of organizations or public programs. Record linkage is the process of combining two or more records to link information relating to a single unit, such as an individual, a family, or an event. Numerous software applications utilize such a technique, often answering questions on relationships or effectiveness of the associated programs, people, or entities. For example, linking police records and court records is useful in answering questions such as which variables (such as, but not limited to, type of assault, and location of a break-in) affect severity of a prison sentence. Hospital discharge data can be linked to themselves to determine if the length of a newborn's postnatal hospital stay is related to his future hospital readmissions. The hypothesis may be that the shorter the hospital stay the more probable the readmission.
More effective data synthesis can lead to better decisions. As an example, a health insurance company trying to better protect the privacy of its internal data may use a predictive modeling (PM) application. A PM application is a software application used to identify individuals with chronic conditions, such as, but not limited to, diabetes or asthma. By successfully identifying chronically ill people, the insurer can enroll the chronically ill people in disease management programs that can improve their care and reduce their health services utilization, which should reduce the medical claims costs to the insurer due to less medical services use.
Unfortunately, today, not all people allow for their data to be used for PM due to privacy fears, such as, but not limited to, unknown secondary use of the data and “insider” data misuse. Also, at times, linkage variables used in PM systems have errors, undermining the data linkage and thus the identification of chronically ill policy-holders. Current literature discusses reasons why certain linkage variables appear more useful than others. Some literature even addresses the “information content” of a linkage variable and asks how beneficial it might be for linkage. However, current literature does not explain the tradeoffs between the errors of a field and linkage outcomes. It may be less clear how to overcome errors of a field in other linkage projects, which may or may not have similar traits to projects already examined.
In addition to the abovementioned, if linkage can also be done more securely, the privacy of the linked individuals can be better protected, engendering greater trust in and mitigating individual harm when using the software applications that depend on record linkage.
Thus, a heretofore unaddressed need exists in the industry to address the aforementioned deficiencies and inadequacies.