Data matching compares data stored in disparate data sources in and across organizations, such as health care organizations. Matching can involve comparing a specific set of static fields in two standardized data records, where a data record represents an entity, such as an individual (e.g., health care patient) or a product (e.g., health care product), and returning a match weight for each static field that indicates a likelihood of a match between the two static fields of the two standardized data records. A match score is generated based on the individual match weights. A higher match score between two data records can indicate a greater likelihood of a match. Data matching can involve comparing different types of data, such as strings, dates, integers, etc. Some examples of data matching can involve comparisons of specialized types of data including first and last names, social security numbers, and dates of various formats. Matching can be used to manage data quality by reducing data duplication and improving data accuracy.
Data matching can be either deterministic or probabilistic. In deterministic matching, either unique identifiers for each data record can be compared to determine a match, or an exact comparison can be used between fields. Deterministic matching is generally not completely reliable since, in some cases, no single field can provide a reliable match between two data records. In probabilistic matching, several field values can be compared between two data records, and a match weight can be generated for each field, where the match weight indicates how closely the two field values match. A match score can be generated as a function of the individual match weights (such as a sum of the individual match weights), where the match score can indicate the likelihood of a match between the two data records.