The problem of data breaches is pervasive and is a highly publicized topic. Famous data breaches include massive customer data losses at Home Depot, Target, Neiman Marcus and Equifax. Most data breaches happen due to poor security posture, employee negligence or software defects. In general, there's no complete guarantee of preventing a data breach due to previously unknown defects in the deployed software products. Constant danger of a potential data breach and its monetary, legal, and business consequences is a driving force behind data protection efforts carried out by the businesses which makes estimation of data protection costs an extremely important task.
In a 2014 study of 700 consumers about brand reputation by Experian and the Ponemon Institute, data breaches were reported as the most damaging occurrence to brand reputation, exceeding environmental disasters and poor customer service. With the ever-growing volume of cyber-attacks on organizations, security analysts require effective visual interfaces and interaction techniques to detect security breaches and, equally importantly, to efficiently share threat information.
In particular, security analysts' at large organizations require effective systems, interfaces, and techniques for conducting data security intelligence, which is a key area at the intersection of big data and cybersecurity analytics. Identification of data protection scenarios is currently a manual process of applying a number of “what-if” scenarios to the enterprise data which is a time and labor-intensive process, does not guarantee an optimal result, and is error prone.
To support large organizations who manage thousands to tens of thousands of databases, Hadoop, and cloud applications in their environment, security intelligence applications, such as Informatica's Secure@Source, allow information security teams to discover sensitive data across all disparate data stores, define hierarchies, and provide logical organization (e.g., classification policies, data store groups, departments, locations, etc.) for measuring the risk associated with the sensitive data discovered.
However, given the large amount of data in distributed databases and the variety of data and policies that govern each data store, data security analysts face the technical challenge of not being able to measure or quantify what sensitive data is most in need of security protection, what sensitive data poses the greatest risks and liabilities in terms of overall impact, financial impact, and reputational impact in the event of a data breach, and what level of protection and what schemes would be most effective in improving enterprise security and reducing the overall impact of a data breach. For example, data stored in a first store of a network database may have completely different data fields, data attributes, and governing regulations than a second store in the network database. This problem grows exponentially in network databases with hundreds or thousands of data stores and data types.
Consequently, improvements are needed in systems for data breach simulation and impact analysis in distributed network databases.