Technical Field
This disclosure relates generally to information security on network-connected appliances.
Background of the Related Art
Today's networks are larger and more complex than ever before, and protecting them against malicious activity is a never-ending task. Organizations seeking to safeguard their intellectual property, protect their customer identities, avoid business disruptions, and the like, need to do more than just monitor logs and network flow data; indeed, many organizations create millions—or even billions—of events per day, and distilling that data down to a short list of priority offenses can be daunting. To address this problem, so-called Security Information and Event Management (SIEM) systems and methods have been developed to collect, normalize and correlate available network data. One such security intelligence product of this type is IBM® QRadar SIEM, which provides a set of platform technologies that automatically discover network log source devices and inspects network flow data to find and classify valid hosts and servers (assets) on the network, tracking the applications, protocols, services and ports they use. The product collects, stores and analyzes this data, and it performs real-time event correlation for use in threat detection and compliance reporting and auditing. Using this platform, billions of events and flows can therefore be reduced and prioritized into a handful of actionable offenses, according to their business impact.
Systems such as described above have the capability of providing forensic analysis of user “identity” data captured from the network flow data. Identity information represents an identification of a person and his or her activity on the network. On-line identifiers, such as email addresses, Skype addresses, MAC addresses, chat IDs, social media IDs, or Twitter IDs, and many others, are used to identify entities or people. In a platform such as described, known entities or persons that are found in the network traffic and documents are automatically tagged, and such data can then be exposed to forensics analysis when the platform is used to investigate an incident. With this known approach, however, the same identity may appear in the network flow data multiple times and originate from many different sources. These multiple appearances typically are represented as a frequency distribution. From a data storage perspective, the frequency distribution data typically is stored in a relational database. A database storage schema of this type, however, makes it difficult for the system to relate disparate relationships, i.e. those not directly-connected to one another, in terms of the relational data. Accordingly, and because each appearance of the same identity in the network flow may require its own data record in the database, the data storage requirements for managing large data sets can become burdensome. Further, such a database schema does not easily relate disparate relationships, especially for identities that have large relation level differences.
Thus, there is a need to provide a new paradigm for managing identity data of the type collected by security intelligence technologies to provide for better data storage and management, and to provide for more efficient analysis and presentation of that data.