As the costs of data storage have declined over the years, and as the ability to interconnect various elements of the computing infrastructure has improved, more and more data pertaining to a wide variety of applications can potentially be collected and analyzed. The analysis of data collected from sensors embedded at various locations within airplane engines, automobiles, health monitoring devices or complex machinery may be used for numerous purposes such as preventive maintenance, proactive health-related alerts, improving efficiency and lowering costs. Streaming data collected from an online retailer's websites can be used to make more intelligent decisions regarding the quantities of different products which should be stored at different warehouse locations, and so on. Data collected about machine servers may also be analyzed to prevent server failures.
The increase in volumes of streaming data has been accompanied by (and in some cases made possible by) the increasing use of commodity hardware. The advent of virtualization technologies for commodity hardware has provided benefits with respect to managing large-scale computing resources for many types of applications, allowing various computing resources to be efficiently and securely shared by multiple customers. For example, virtualization technologies may allow a single physical computing machine to be shared among multiple users by providing each user with one or more virtual machines hosted by the single physical computing machine. Each virtual machine can be thought of as a software simulation acting as a distinct logical computing system that provides users with the illusion that they are the sole operators and administrators of a given hardware computing resource, while also providing application isolation and security among the various virtual machines. In addition to computing platforms, some large organizations also provide various types of storage services built using virtualization technologies. Using such storage services, large amounts of streaming data can be stored with desired availability and durability levels in a cost-effective manner, with a given physical storage device potentially being used to hold data records of many different service customers.
As the volumes at which streaming data can be produced increase, the need for analysis tools that work on streaming data has also increased. For example, for some security-related applications or health-related applications, the ability to identify data outliers (i.e., unusual or anomalous data records or data patterns) fairly quickly may be critical. Unfortunately, many of the machine learning and statistical algorithms which have been developed over the years for such tasks were either designed primarily with static data sets in mind, or are difficult to scale for streaming data. As a result, identifying anomalous data efficiently within high-volume streams remains a challenging problem
While embodiments are described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that embodiments are not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit embodiments to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description or the claims. As used throughout this application, the word “may” is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words “include,” “including,” and “includes” mean including, but not limited to. When used in the claims, the term “or” is used as an inclusive or and not as an exclusive or. For example, the phrase “at least one of x, y, or z” means any one of x, y, and z, as well as any combination thereof.