Voluminous amounts of data may be produced by server event loggings in an enterprise wide data processing system. This kind of data is sometimes referred to as a form of “Big Data.”
It has been proposed to extract anomaly indications in real time from real-time Big Data streams and to present the same to human administrators as alerts. However, the volume of data is generally too large to provide meaningful and actionable information for human users and it is often coded in varying manners which makes extraction of any meaningful/actionable information difficult. More specifically, a typical event data stream (e.g., Apache™ log file) may have thousands, if not hundreds of thousands of records with each record containing large numbers of numerical and categorical fields/features. Semantic names of fields can vary among data sources and thus there is little consistency. New data sources can be introduced having unknown coding formats that system administrators do not have previous experience with. Formats of associated numeric and/or qualitative items inside the data streams can vary as between different data sources (e.g., logging systems of different servers). Many of the event records contain information which is not indicative of any relevant anomaly whatsoever. Among those records that do contain data indicative of an anomaly, many of such records can be duplicative of one another and thus they do not provide additional useful information beyond that already provided by a first of these records. Also within each anomaly indicating record there can be many fields whose numeric and/or qualitative (e.g., categorizing) information is of no causal relevance with respect to an indicated one or more anomalies. Thus it is difficult to extract and provide meaningful and useful information from such voluminous amounts of real time streamed data for use in forming alerts that have practical utility (e.g., comprehend-ability and action-ability) to human administrators.
It is to be understood that this Background section is intended to provide useful introductory information for understanding here disclosed and novel technology. As such, the Background section may include ideas, concepts or recognitions that were not part of what was known or appreciated by those skilled in the pertinent art prior to corresponding invention dates of subject matter disclosed herein.