In machine leaning, cluster analysis is typically used as an unsupervised algorithm to detect anomalies. The clustering analysis groups data objects based on characteristics that describe the objects and relations among them. The clustering analysis divides a set of objects into groups such that similar objects are grouped together, and different groups contain objects with dissimilar characteristics. Good clustering is generally characterized by high similarity within a group and high differences among different groups.
A dataset may contain objects whose characteristics are significantly different from other objects in the dataset. These data objects having significant differences are known as outliers or anomalies. Outlier identification finds smaller groups of data objects that are considerably different from the rest of the data. Outlier mining identifies patterns in data that do not conform to the rest of the data. Outlier mining is used in fields such as telecommunication, financial fraud detection, rare gene identification and data cleaning.