Known network behavior anomaly detection is configured to: (A) provide an approach to detect network security threats; (B) provide a complementary technology to systems configured to detect security threats based on packet signatures (associated with data packets conveyed over a network); (C) provide continuous monitoring of a network for unusual events or trends; and (D) provide an integral part of network behavior analysis, which offers security in addition to that provided by known anti-threat applications (such as, firewalls, intrusion detection systems, antivirus software and spyware-detection software, etc.)
Known security monitoring systems (for networks) are configured to use a signature-based approach to detect threats associated with a network. They are configured to: (A) monitor packets (data packets) that are conveyed over the network; and (B) examine patterns in the data packets that match the contents of a database of signatures representing pre-identified known security threats. Network behavior anomaly detection based systems are configured to detect security threat vectors for cases where signature-based systems cannot; examples of such cases include: (A) new zero-day attacks (the first day of a new attack); and (B) when the threat traffic (data flow) is encrypted, such as the command and control channel for certain Botnets (a Botnet is a collection of Internet-connected programs communicating with other similar programs in order to perform tasks).
Known network behavior anomaly detection programs (computer programs) are configured to: (A) track critical network characteristics (in real time); and (B) generate an alarm for the case where a strange event or trend is detected in network characteristics (attributes) that may indicate a presence of a threat to a network. Examples of such characteristics include traffic volume, bandwidth use, protocol use, etc. In addition, known network behavior anomaly detection programs are also configured to monitor the behavior of individual network subscribers. In order for the network behavior anomaly detection program to be optimally effective, a baseline of normal network or user behavior may be established over a period of time. Once certain parameters have been defined as normal, any departure from one or more of the parameters (attributes) is flagged as anomalous.
Known network behavior anomaly detection programs may be used in addition to conventional firewalls and applications for the detection of malware. Some vendors have begun to recognize this fact by including network behavior anomaly detection programs as integral parts of their network security packages. For instance, network behavior anomaly detection technology and/or techniques (methods) are applied in a number of network and security monitoring domains, including: (A) log analysis; (B) packet inspection systems; (C) flow monitoring systems; and/or (D) route analytics.
For instance, network behavior anomaly detection may be used for detecting cyber-attacks directed to a network. Cyber-attack detection is primarily performed using signature based approaches. The attack is identified based on a known signature of that particular attack. Some of the techniques include: firewall log processing, simple network management protocol (SNMP) based tools, deep packet inspection (DPI), and security information and event management (SIEM) platforms. Another complementary approach for detecting attacks includes analyzing and identifying an anomaly in network traffic behavior. Some of the advantages of the anomaly detection based approach are: ability to detect attacks associated with encrypted traffic, ability to detect attacks at zero day, etc.
The following is a listing of published references that disclose cyber-attack detection methods and/or cyber-attack detection systems:    Published reference number [1]: LAKHINA, A., CROVELLA, M., DIOT, C., CHARACTERIZATION OF NETWORK-WIDE ANOMALIES IN TRAFFIC FLOWS, Proceedings of the 4th ACM SIGCOMM conference on Internet measurement, 2004, Page(s): 201-206.    Published reference number [2]: LAKHINA, A., PAPAGIANNAKI, K., CROVELLA M., DIOT, C., E. D., TAFT, N., STRUCTURAL ANALYSIS OF NETWORK TRAFFIC FLOWS, Proceedings of the joint international conference on Measurement and modeling of computer system, 2004, Page(s): 61-72.    Published reference number [3]: LAKHINA, A., CROVELLA, M., DIOT, C., MINING, ANOMALIES USING TRAFFIC FEATURE DISTRIBUTIONS, Proceedings of the ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, 2005, Page(s): 217-228.    Published reference number [4]: RINGBERG, H., REXFORD. J., SOULE, A., DIOT, C., SENSITIVITY OF PCA FOR TRAFFIC ANOMALY DETECTION, Proceedings of SIGMETRICS 2007, 2007.    Published reference number [5]: BRAUCKHOFF, D., SALAMATIAN, K., MAY, M., APPLYING PCA FOR TRAFFIC ANOMALY DETECTION: PROBLEMS AND SOLUTIONS, Proc. IEEE Infocom, Rio de Janeiro, April 2009.Anomaly Detection Using Principal Component Analysis
An anomaly in network traffic (network data flow) is defined as any network activity or phenomenon that makes the network traffic pattern to not conform to the normal behavior (expected behavior) of a network. This definition may include network traffic outages, flash crowds, misconfigurations, vendor implementation bugs, cyber-attacks, network worms, malware, etc. A network anomaly may not always represent a security threat for the network.
An anomaly detection algorithm is an algorithm that is configured to detect and diagnose network anomalies of the network, so that a network administrator (user) may attempt to fix the problem (as quickly as they may, given the urgency of a potential threat to the network).
The principal component analysis (PCA) operation is a known method (operation) for detecting a network anomaly; reference is made to published reference [1], published reference [2], published reference [3], published reference [4], and published reference [5].
The PCA operation works based on the dimensional reduction property of the PCA method, and was shown to be effective in finding and diagnosing network anomalies in large networks where the dimension of the network is relatively large; reference is made to published reference [1], published reference [2], and published reference [3].
In anomaly detection, the number of columns of an input matrix may be equal to the number of features and the number of rows may be equal to the number of time-bins.
The various steps (operations) for the PCA operation are as follows (these operations are not depicted since they are known to persons of skill in the art):
Operation [1] includes directing a server (not depicted) to generate (create) a zero-mean traffic matrix (with mean zero for all the columns) from the [m×n] input network traffic matrix. Operational control is passed over to operation [2].
Operation [2] includes directing the server to generate (create) a covariance matrix of the zero-mean traffic matrix that was generated in operation [1]. Operational control is passed over to operation [3].
Operation [3] includes directing the server to calculate the eigenvalues and eigenvectors of the covariance matrix that was generated in operation [2]. Operational control is passed over to operation [4].
Operation [4] includes directing the server to sort the eigenvalues and select the first [k] largest eigenvalues and consider the corresponding eigenvectors to be principal components. Operational control is passed over to operation [5].
Operation [5] includes directing the server to create a matrix [P] by putting the principal vectors together. Operational control is passed over to operation [6].
Operation [6] includes directing the server to map the input matrix into the anomalous space by using the following formula, as follows:{tilde over (x)}=(I−PPT)x resulting in an anomalous space mapping matrix. Once the above formula has been executed, operational control is passed over to operation [7].
Operation [7] includes directing the server to calculate the anomaly score for each time-bin by finding (computing or calculating) the square prediction error (SPE) score by using the following formula:∥{tilde over (x)}∥2 
Once the SPE score has been executed, operational control is passed over to operation [8].
Operation [8] includes directing the server to: (A) compare the SPE score with a threshold; (B) detect an anomaly for the case where the SPE score is larger than the threshold; and (C) mark the SPE score as normal for the case where the SPE score is less than (or equal to) the threshold. Then, operational control may be passed over to operation [1], if so desired.
There are two main issues with the application of the PCA operation for anomaly detection as described below.
The first issue is a sensitivity problem. The PCA operation may be sensitive to the number of eigenvectors determining the normal subspace and anomalous subspace. It is difficult to pick a value for [k], in operation [4], so that normal and anomalous space may be separated.
The second issue is putting all of the features in one matrix while the features are of different natures and scales. For example, in one column, the [byte count] attribute may have a volume feature with relatively large values, and in the other column, the [entropy] attribute may have values that are relatively small. In this case, if the PCA operation is applied (used) over the data matrix, the information may be lost for the [entropy] attribute since this data may be very small in comparison to the data associated with the [byte count] attribute. The data for the [entropy] attribute may disappear in the analysis as this data may not be large enough to play a meaningful role in the normal or anomalous subspaces. So, most of the anomalies may be lost which are determined by entropy data, such as port scans.
A solution to this problem offered by the prior art is to scale all the feature data by some value. But the PCA operation is also sensitive to the scaling of the data values. In other words, different scaling factors may result in different outcomes.
Moreover, at the end of the PCA operation, it may be required to compare the SPE (square prediction error) score of each time-bin with a threshold. Again, selecting a threshold may be problematic and influential in the outcome of the PCA operation. It will be appreciated that selecting a low threshold may result in a very large false positive rate, and selecting a high threshold may result in a high false negative rate. In general, it was observed that the dependency and sensitivity of the PCA operation is a challenge in anomaly detection using the PCA operation.
The main parameters of the PCA operation that affect the performance of the PCA operation are: (A) selecting the threshold value for identifying the normal subspace and the anomalous subspace, and (B) selecting the SPE threshold value for the detection of anomalies.
In view of the foregoing, it will be appreciated that there exists a need to mitigate at least in part problems associated with the detection of an anomaly associated with a network.