Technical Field
The subject matter described herein relates to visualizing data to facilitate identification and protection of sensitive data.
Description of Related Art
Modern day computer networks store numerous types of data, including sensitive data. Sensitive data contains information that could cause harm to individuals and businesses if compromised. Example types of sensitive data include health care records, financial data, and personal identification information. Because the consequences of exposing sensitive data are severe, network administrators apply one or more protection policies to the sensitive data as an additional layer of security beyond a database's standard protections.
Identifying sensitive data may be a challenge for a number of reasons. Within a computer network, databases store countless data records, which are incessantly modified, added, and deleted. To ensure that the locations of sensitive data are known, frequent scans may be used. However, frequent scans are not practical if the scans are computationally intensive, as a scan of each individual database record would be. Accordingly, computationally intensive scans the frequency of scans and the accuracy of the network administrator's sensitive data knowledge. Additionally, not all sensitive data records are of equal importance. A network administrator may not have time to examine every database in a network. When examining databases without prioritization, a network administrator may miss critical databases that present a high overall level of risk.
Furthermore, multiple databases may access sensitive data records. Protecting all copies of a data record may not be practical if the network administrator cannot directly apply protection policies to a database. For example, an external database controlled by another entity accesses a database containing sensitive data, to the network administrator cannot instruct the external database to apply the protection policy.
A “risk” score is a metric commonly used in the security industry to define the risk associated with a component of a data set and to identity the level of vulnerability and impact. The risk score is typically expressed as a value between 0 and 1, with 1 being the highest risk score. The typical risk score is calculated based on different risk factors. Those risk factors could be, by example, the number of sensitive fields or level of data protection for the data set.
When a risk score is computed on a group of risk factors, each risk factor typically contributes to the risk score based on a weight, where the weight given to each factor is based on a particular perspective of the importance of that factor. Thus, a given risk score represents a specific perception of the importance of the risk factors. However, the importance of each factor is not the same for all the different stakeholders. Thus, the risk evaluation using a risk score is may not adequately express actual risk, since it is based on a single perspective of an operational concern for the entity setting the risk factor weights. It would thus be beneficial if risk were scored such that it represented multiple risk assessment types, multiple policies like PCI or PII, or a risk perception amalgam across multiple assessment types. An assessment type represents are particular framework or operational concern for evaluating a security threat. For example, in an enterprise the various assessment types can include liability, reputation, business interruption, compliance, and customer loss.
Similarly, data quality scoring presents the same type of assessment type variations and challenges, e.g., the data quality associated with trend analytics as opposed to the data quality for fraud detection; each one has a different data quality focus.