Entities, such as businesses, may utilize backend systems to store big data. Such data may be used, in some instances, for analysis and/or reporting for the entity about the performance of the entity. In some instances, an analysis of such data may be performed to determine outliers in a data set. Outlier detection may be adapted for a specific application, such as eliminating outliers in sales data, in census data (e.g., human census data, wildlife census data, etc), or other applications where outlier detection may be useful.
A K-Nearest Neighbor algorithm (“KNN”) is a classification algorithm used for grouping points or values under consideration using the k nearest neighbors based on the Euclidean distance between the point or value and the neighbor, where k determines the number of nearest neighbors to be considered for the calculation.