1. Field of the Invention
This invention relates generally to techniques for analyzing symbolic time series. More particularly, the invention relates to methods for characterization and identifying dynamical features of time series data that is either initially given as or transformed into a sequence of ordered symbolic pairs, such as communications traffic, financial transactions, logistical, genetic, or time series data.
2. Description of the Related Art
Proliferating analytic approaches suffer collectively from various problems such as overspecialization, scalability, and opacity of the methods used. Detection operating characteristics often leave considerable room for improvement and the architectures required for these approaches often do not readily admit effective post-analysis techniques. The detection operating characteristic shortcomings are typically addressed by narrowing the analytic scope of a system into one of two basic frameworks. Generic analytical techniques such as neural networks or hidden Markov models tend to be more mathematical in nature and often suffer from under-fitting and inadequate characterization of inputs but benefit from comparative rigor and versatility, whereas specialized techniques tailored to a particular problem domain tend to be more heuristic and often suffer from over-fitting and over-reliance on, or uncertainty about, the details of inputs. A general technique combining rigor and flexibility in a high-performance analytic framework is therefore desirable. Additional desiderata of such a technique include the ability to address static or dynamical data scalably and efficiently, as well as to enable post-processing and data interactivity. For the sake of concreteness, a particular application of the present invention to monitoring computer network traffic is detailed.
While worldwide spending on network security is estimated to be over $30 billion per year and growing, the information infrastructure is increasingly less secure. Security incidents reported to the CERT Coordination Center (CERT/CC)—the first computer emergency readiness team—rose 2,099 percent from 1998 through 2002—an average annual compounded rate of 116%. As evidenced by increasing cyber crime, existing security systems, most of which use signature- and heuristic-based analysis for detection, are ineffective against new attacks, variations of known attacks or attacks masked as normal network behavior.
Embodiments of the present invention include a scalable, real-time solution to complement existing security systems and detect unusual activity. Embodiments of the present invention may exploit the complex nature of the information infrastructure where millions of packets are exchanged between thousands of component parts. The nonlinear dynamics of the system exhibit complex global behavior and time evolution, leading to tipping point phenomena, where anomalous or malicious behaviors are hardly noticeable and suddenly the system transitions from operational to non-operational. Embodiments of the present invention leverage the scale and complexity of networks and use the principles of statistical physics and thermodynamics to define thermal properties like entropy, temperature and energy for network states and changes in the properties as packets move through the network. Fluctuations in state properties reveal unusual network activity that is not detectable by signature- and heuristic-based systems and leads to detection of anomalous or malicious behaviors before reaching a tipping point. The key to realizing the overall vision is the real-time network sensing, packet processing, and interface to intuitively display alerts that highlight changes in network behavior as well as providing an autonomous operational capability.
The methods of information theory play a unique role in this context, in that they are rigorous and generic but often find considerable success even in narrow problem domains. By the same token, the increasing application of the methods of stochastic processes and statistical physics to nontraditional areas and the many connections between these fields augur the need for, and the importance of, a unifying analytic architecture leveraging their techniques. Background references include: U.S. Pat. No. 6,470,297, which describes a system known as Therminator; J. C. McEachen et al., “Real-time representation of network traffic behavior for enhanced security,” 2005, Proc. 3rd Int. Conf. on Info. Tech. and Appl.; and J. C. McEachen et al., “An analysis of distributed sensor data aggregation for network intrusion detection,” 2007, Microprocessors and Microsystems vol. 31, pp. 263-272. These background references were developed by the U.S. Department of Defense and used the spirit of information theory and statistical physics in an early step towards this end. Therminator maps network traffic onto a sequence of ordered symbolic pairs and subsequently onto a sample trajectory of a multi-urn Ehrenfest model. The average state and the distribution of states in this model are supplied for visualization along with thermodynamical quantities including an effective temperature that mimics some coarse features of the essentially unique temperature function consistent with equilibrium statistical physics. Independent academic work highlighting the essential tractability of computer network traffic using the idiom of thermal field theory (M. Burgess, “Thermal, nonequilibrium phase space for networked computers,” 2000, Phys. Rev. E, vol. 62, pp. 1738-1742) also contributed in this direction. This approach used the fact that resealing a fluctuating time series in units of its standard deviation is a conformal transformation. By applying such a transformation to pseudo-periodic time series, the methods of thermal field theory can be used to describe the underlying data. In this work it is noted that computer network traffic is pseudo-periodic due to diurnal, weekly, monthly, and yearly patterns, and this fact is used in order to describe information transactions along the lines above.
However, both these methods face several problems, such as a dependence on initial and boundary conditions in their internal data representation or the underlying data itself; a lack of any mathematically rigorous method for automatically identifying dynamical features in data; and an insufficiently realized capability for post-processing or interactivity.
Thus, there is a need for new and improved systems and methods for information assurance. Such systems and methods should receive an anomalous event parameter; receive a plurality of network events from a network; associate each of the network events with a timestamp; classify each of the network events into at least one of a plurality of cycles, based, at least in part, on the timestamp; form a dynamical state probability distribution corresponding to the plurality of network events; compute a discrete martingale for the plurality of network events; computationally determine whether or not a network event of the plurality is anomalous, based, at least in part, on the anomalous event parameter; and store the determination to a computer readable medium or display the determination.