In general, computer and network security products focus on the prevention of various possible types of attack on the security of the network and the computing devices within it. Firewalls are generally used to prevent unauthorised traffic from entering the internal, local, or otherwise bounded network or networks of a company, a particular group of users, an individual user, or another such entity; authentication mechanisms may be used to prevent unauthorised persons from logging on to a company's (or other entity's) computers; and encryption may be used to prevent unauthorised persons from reading files on, or sent to or from those computers. Such products cannot be relied upon to work perfectly, however, and because security ‘bugs’ may exist in other software or hardware, complete network security also requires monitoring and detection, and appropriate response in the event of a breach.
An effective monitoring, detection and response system for a network may be provided internally (e.g. by a system administrator of a company network, or by a user of an individual network). It may include firewalls, secure servers and routers, dedicated intrusion detection systems, and other security products, all of which may provide audit and other information about their status, about possible security/related issues, and about other characteristics or events. These may be assessed by internal system administrators or in an automated or semi-automated manner. Whatever “internal” security provisions an entity uses, these may be augmented by making use of externally-provided network security services. While externally-provided network security services should not generally be used by a customer (e.g. a company, an individual, etc.) as a complete substitute for provisions used by themselves or their own system administrators, individuals and system administrators normally do not have the time or ability to read through large amounts of constantly updated audit information, looking for attacks on their systems. They also may not have the time to continuously monitor the activities of ‘hackers’ (or other such parties), looking out for new tactics, tools and trends in what they may be using. Also, they may not have the time to become experts on every kind of intrusion or attack and to maintain that expertise.
A monitoring, detection and response system that employs human intelligence, uses trained personnel, and takes advantage of network security intelligence and other knowledge databases can provide network users and system administrators with the advice and coaching they may need, when they need it, to help them repel or otherwise respond to attacks and maintain network integrity and ‘uptime’. While completely automatic defenses may be used, and may work against some attacks (particularly automated attacks), they may be at a disadvantage against an intelligent attack, against which a specialist intelligent monitoring, detection and response capability may be needed. Such a specialist capability may be provided by a dedicated external service provision entity (although it will be noted that a corresponding service could also be provided by a similarly dedicated service provision entity within what could still be regarded as a customer's network). In any case, for such a dedicated service to be provided in respect of a network, the network (generally by way of one or more computing devices within it) generally needs to provide information about possible network security incidents or ‘events’ (such as possible intrusion of unauthorised traffic, possible attempts by unauthorised executable software that has successfully entered the network to take action against the interests of the network or its users, possible attempts by unauthorised to breach or otherwise overcome authentication mechanisms, etc.).
In order to allow such a capability to be provided (whether externally or internally), computing devices in the network and/or software applications running thereon may be arranged to send “status messages” to an entity providing such a capability. These may filtered or otherwise controlled such that only those that may relate to potential network security issues are sent, in which case these may be regarded as “network security messages”. Such messages can also be provided by dedicated sensors or probes set up to monitor devices, exchanges of data, and other such interactions between devices in the network. Such messages may be provide information about the devices affected (e.g. their identity, their location in the network, software applications that they have been using, etc.) and/or software applications affected in a form that enables them to be processed manually (i.e. by a human analyst), automatically (i.e. by a computer processor), or by a combination of the two. Such messages generally provide information in the form of data fields of several defined types, in order to allow for efficient processing. This may be necessary because the computing devices in a large network, such as individual computers, firewalls, servers, routers, specific intrusion detection systems, etc. can generate millions of lines of security-related or other status or audit information each day. This may contain information indicative of ongoing network attacks or intrusions, but this may not get noticed in amongst a large amount of other less-critical or non-security-related audit information.
In view of this, it is known that Managed Security Monitoring (MSM) services may be provided and used, which can assist a customer by receiving “network security messages” and/or more general status messages from the customer's network (and from the devices of that network), and filter and analyse the information therein effectively in order to detect such attacks or intrusions, and if required, suggest or provide an appropriate response.
Once a possible attack or intrusion (i.e. a network security “incident” or “event”) is detected, its characteristics and particulars may then be examined and analysed by trained security analysts continuously monitoring the customer's network to further understand the incident and eliminate false positives. In analysing the incident, security analysts can draw upon information and knowledge from a variety of sources, including but not limited to security intelligence databases containing information about the characteristics of various hacker techniques and tools and known vulnerabilities in various operating systems and commercial software products and hardware devices. If necessary, security analysts can escalate the handling of the incident according to a variety of possible escalation procedures to stop the attack and shut down the vulnerability before the attacker does any damage. In effect, an MSM service can act as an additional defensive shield for a customer's network.
U.S. Pat. No. 7,895,641 and corresponding International application WO 01/71499 (“Schneier”/“Counterpane”) relate to a known technique for dynamic network intrusion monitoring, detection and response. According to this, a probe attached to a customer's network collects status data and other audit information from monitored components of the network, looking for footprints or evidence of unauthorised intrusions or attacks. The probe filters and analyses the collected data to identify potentially security-related events occurring in the network. Identified events are reported to a human analyst for problem resolution. The analyst has access to a variety of databases to aid in problem resolution, and may follow an escalation procedure in the event he or she is unable to resolve the problem. Various customer personnel can be alerted in a variety of ways depending on the nature of the problem and the status of its resolution. Feedback from problem resolution efforts can be used to update the knowledge base available to analysts for future attacks, and may be used to update the filtering and analysis capabilities of the probe and other systems.
Aspects of an existing product, which will be referred to as the “Counterpane product”, for providing managed security monitoring (MSM) services essentially in accordance with the technique described in U.S. Pat. No. 7,895,641 will briefly be described, and will be referred to again later in more detail. The Counterpane product uses sensors to monitor customers' data networks for security threats. Data from the sensors in the form of messages are received by a module referred to in the Counterpane product as the “Sentry”. This contains a filtering subsystem which classifies the messages as “positive messages” (messages that need to be monitored, as they appear to relate to security issues that may require action to be taken), “negative messages” (messages that may be discarded, because they do not relate to security issues), and “residue messages” (messages that are not possible to parse).
Messages classed as “negative” are discarded.
Messages classed as “positive” are passed to a module referred to as “SOCRATES”, an acronym which stands for “Secure Operations Center Responsive Analyst Technical Expertise System”. This collects and formats messages into “problem tickets” (each of which represents a discrete security-related event or incident of possible intrusive activity on a customer's network), associates with each such ticket information useful for problem investigation, resolution and response, presents such tickets to security analysts for handling, and generally serves as a repository of useful information and procedures.
The filtering subsystem may be unable to parse the residue messages because it does not have enough filters, or does not have enough information to parse the messages. This may also occur if there has been a change or an error in the format of the security messages. With the Counterpane product (and as described in U.S. Pat. No. 7,895,641), these residue messages are passed to a human analyst for checking and analysis. As this is done manually (i.e. by human analysts) and takes time and human efforts, there may be a significant delay before the residue messages have been analysed. Further, human error may affect the analysis, and/or some residue messages may even be left un-analysed because of lack of manpower.
Residue messages that have been manually analysed may thus get manually classified (correctly or incorrectly) as “positive messages” or “negative messages”. Further to this, messages that have been classified (manually or automatically) as “positive messages” or “negative messages” may get manually analysed in a separate operation, and it is possible that this may result in a human decision to the update the filter engine.
Techniques such as the above thus depend on significant manual input from human analysts both in analysing residue messages and in reviewing messages classed as positive.
A known technique in data analysis is “clustering”. Clustering algorithms generally aim to divide a set of objects into groups (clusters), where objects in each cluster are similar to each other (and as dissimilar as possible to objects from other clusters). Objects that do not fit well to any of the clusters detected by the algorithm may be considered as “outliers”, or to form a special cluster of outliers.
A paper by Risto Vaarandi entitled: “A Data Clustering Algorithm for Mining Patterns From Event Logs” (Proceedings of the 2003 IEEE Workshop on IP Operations and Management, ISBN: 0-7803-8199-8) relates to a problem whereby event logs such as those used in system and network management contain vast amounts of data that can easily overwhelm a human. Identifying that mining patterns from event logs is an important system management task, the paper presents a clustering algorithm for log file data sets which can help in the detection of frequent patterns from log files, in the building of log file profiles, and in the identification of anomalous log file lines. The technique outlined considers standard “sys log” formats that contain free-form text, and suggests using clustering on the full set of event logs in order to try to form clusters that would cover “outliers”. It will be understood that clustering is thus used in what can be regarded as a single processing stage on full sets of event logs with the aim of avoiding any of these from being left outside of the clusters.
A paper by Feng Xuewei, Wang Dongxia, Zeng Jiemei, Ma Guoqing, and Li Jin, entitled: “Analyzing and Correlating Security Events Using State Machine” (2010 10th IEEE International Conference on Computer and Information Technology (CIT 2010)) relates to attack scenario reconstruction, clustering analysis, causal analysis, use of an attack scenario tree, and use of a correlating state machine. Recognising that it may be unfeasible for a security manager to analyse security events manually, the paper propose use of an attack scenario reconstruction technology based on a state machine. The processes of attackers can be replicated and more comprehensive attack scenario description information can be generated. The paper appears to suggest using clustering in conjunction with existing knowledge in order to alert an analyst about possible attacks, based on types of attack that have happened previously.
Referring to other (cited) prior art, a paper entitled “Mining Alarm Clusters to Improve Alarm Handling Efficiency” by K. Julisch et al (http://www.acsac.org/2001/papers/115.pdf) discusses a problem wherein intrusion detection systems overload their human operators by triggering thousands of alarms per day, and presents the results of some research indicating that alarms should be managed by identifying and resolving their root causes. It discusses alarm clustering as a method that supports the discovery of root causes.
A paper entitled “An efficient algorithm for clustering intrusion alert” by J. D. Adelina et al (http://www.iatitorq/volumes/Vol37No2/11Vol37No2.pdf) discusses intrusion detection systems, noting the problems of relying on alert supervision. It proposes using “Meta alerts”, which are generated for appropriate clusters and which form generalisations of alerts, as a way of identifying origins of alerts. A hybrid clustering algorithm is proposed which is applied to the data set. Redundant data are filtered in order to reduce the rate of false positives.
US patent application US2004/015719 (“Lee et al”) relates to network security protection, and to integrated security systems in which individual security agents are actively inter-related. In particular, it proposes a security system comprising a firewall for interconnecting and controlling access between external and internal networks, a plurality of security agents for monitoring a data flow and system calls over the internal network, an “intelligent” security engine (ISE) for analysing alert messages, traffic information and event information from the security agents to decide if there is an attack and to generate a signature through a learning process, and a security policy manager for managing and applying a security policy to each of the security agents based on a decision of the ISE.
US patent application US2005060562 (“Bhattacharya et al”) relates to techniques for displaying network security incidents.
U.S. Pat. No. 8,176,527 (“Njemanze et al”) relates to a “correlation engine” or “rules engine” with support for time-based rules. The rules engine receives security events generated by network devices, which are aggregated and provided to the rules engine at specific times associated with time-based rules. The security events are cross-correlated with the time-based rules; and one or more first stage meta-events are reported.
The present inventors have identified a need to provide further assistance in relation to the analysis of messages such as security messages that are sent for processing by a managed security monitoring system, in particular in relation to those that do not match with an existing knowledge base. In relation to parts of the processing that are currently performed manually, the fact that such parts are time consuming when performed by human analysts may mean that they are done incorrectly, done too slowly to be of use in real-time or near real-time processing, or not done at all. While automation is often seen as a solution in relation to tasks that are time-consuming when performed by human analysts, it is not generally possibly to automate parts of a process that rely on decision-making that is not based on existing rules.