Associative memories, also referred to as content addressable memories, are widely used in the field of pattern matching and identification, expert systems, artificial intelligence and analogy detection. As used herein, analogy detection is an associative memory function that finds similar things to a given thing. Analogy detection may be useful for data cleansing, alias detection, and other applications. Analogy-based reasoning also can use analogy detection across a set of related things, and may be used for many different applications, including hypothesis generation.
For analogy detection, similarity is generally used as a defining criteria. However, similarity metrics by themselves may not provide decisions. For example, everything may be considered similar to everything else, even if the only element in common is in being a “thing”. Thus, analogy detection should decide how much similarity is needed to consider two things as effectively the same, given the task at hand. Conventional analogy detection may determine similarity by setting a threshold which is based on the tradeoffs between hits, missed opportunities, false alarms and correct rejections. Other conventional analogy detection may include such decision theories as hyperplane separation models which may try to fit the data on one or another side of a separation plane.
It is also known to use various measurements of similarity by considering the similarity of two objects, each described as a vector of attributes. For example, the cosine of the angle between two vectors is known as a measure of document similarity. Jacquard similarity, the proportion of overlapping attributes, is also known in building biological taxonomies. Edit distance is yet another measure that may be used for alphabet detection, such as when comparing text strings of letters or protein sequences of amino acids.
Similarity measures also may be an underlying basis for clustering, such as in methods for market segmentation of hierarchical classification. K-mean clustering can be used to place an item in one group or another, wherein the group is best defined by its average center. Bootstrapping techniques also may be used to look for similarities from a graph perspective, by traversing links in search for other nodes that share the same connections. Finally, mutual neighbor techniques, also called shared nearest neighbor techniques, look to confirm each node-node value in a similarity matrix by also asking how well the similar nodes share the same set of nearest neighbors.
Notwithstanding these and/or other techniques, there continues to be a desire to provide analogy detection methods, systems and computer program products that can provide more accurate analogy detection among large numbers of entities, for alias detection, data cleansing and/or other applications.