An anomaly, also referred to as an outlier, deviation, exception, intrusion, contamination, or the like, is most aptly defined as an event, pattern, point, or data element that does not conform to expectations or what those versed in the given art consider to be “normal” behavior. Most traditionally, an anomaly occurs infrequently when compared to other normal events in a given data stream or pattern. Since anomalies are in most cases rare by definition, their identification becomes exceedingly important in fields where detection of the abnormal or uncommon is paramount. Some examples include, inter alia, healthcare and medicine where an atypical lesion detected during a standard magnetic resonance image might indicate a more serious disease such as cancer; homeland security and border control where an aberrant vessel nearing the coastline might suggest illegal activity; finance and investment where unusual transaction behavior could signify fraud in some form; and computer and network security where seemingly harmless network traffic changes might actually indicate the presence of a worm or other intruder.
As various and numerous enhancements to modern-day security, surveillance, and other monitoring systems inundate analysts of varying research fields with immense volumes of highly dimensional data, the obvious need to automatically and intelligently detect the presence of anomalous events, all the while reducing the occurrences of false positive identifications, is only exacerbated. Where once manual human inspection was preferred, the science has grown to include, inter alia, graphical approaches such as box, scatter, and spin plots depending upon data dimensionality; statistical approaches such as parametric modeling for describing data distribution and delineating between majority and anomalous data; distance based approaches such as nearest neighbor and clustering techniques whereby outliers are defined based upon their “fit” in the neighborhood of surrounding data elements; and model based approaches such as data element classification whereby “normal” and/or “abnormal” exemplars are used as models to differentiate data elements of a given data stream. Nonetheless, each of the aforementioned techniques are marred by inherent complications, including high occurrences of false positive identifications as previously unseen but legitimate data elements are marked anomalous; difficulties in estimating data distributions due to dimensionality issues; computational inefficiencies; data confusion when the original data stream is plagued by noise; necessitation of registration between and among data streams when attempting to analyze data elements relative to those of a neighborhood; etc.