A modern organization typically maintains a data storage system to store and deliver sensitive information concerning various significant business aspects of the organization. Sensitive information may include data on customers (or patients), contracts, deliveries, supplies, employees, manufacturing, or the like. In addition, sensitive information may include intellectual property (IP) of an organization such as software code developed by employees of the organization, documents describing inventions conceived by employees of the organization, etc.
Organizations take lot of efforts to install DLP components, especially on important machines where confidential data is getting generated, but they may not be able to protect each computer in the enterprise, due to reasons like large number of different platforms or operating systems (OS), machine outages, quick and dynamic provisioning of virtual machines, no clear and individual accounting for test and lab machines. DLP technologies apply configurable rules to identify objects, such as files, that contain sensitive data and should not be found outside of a particular enterprise or specific set of host computers or storage devices. Even when these technologies are deployed, it is possible for sensitive objects to ‘leak’. Occasionally, leakage is deliberate and malicious, but often it is accidental too. For example, in today's global marketplace environment, a user of a computing system transmits data, knowingly or unknowingly, to a growing number of entities outside a computer network of an organization or enterprise. Previously, the number of entities were very limited, and within a very safe environment. For example, each person in an enterprise would just have a single desktop computer, and a limited number of software applications installed on the computer with predictable behavior. More recently, communications between entities may be complex and difficult for a human to monitor.
Due to global business presence, many organizations have content classification polices supporting various languages. Current DLP solutions make use of all these policies irrespective of the language of content being scanned, which is quite inefficient. Ideally, content needs to be scanned only by using content specific policies. For example, there is no need to scan the data content with specific English-based polices when the data content is in Japanese.