The invention relates to computing systems, and more particularly to mining of policy data source descriptions.
An information system of an enterprise may store large amounts of data carrying information about different aspects of different enterprise issues as for, e.g., products. In different phases of such enterprise's issue, specific data concerning a specific process step of the enterprise's or company's workflow might be necessary for decision making or in litigation. For example, data might be due to being disposed of in order to free memory, or all information related to a specific product has to be prepared in a deliverable manner in case of a litigation situation. For enabling, e.g., rigorous compliance, defensible disposal or ediscovery (also known as discovery of electronically stored information ESI), a policy has to be applied to the data of interest.
In order to achieve these goals, a list of policy data sources (PDS) is necessary. A policy data source is a collection of business information objects such that a company policy can be applied to such collection. A PDS can be a physical repository, such as a file share or a database, but it can also be an organizational collection, as, e.g., all presentations from the marketing department. Generally, the task of obtaining a comprehensive list of policy data sources of an organization is rather tedious for the following reasons: i) The extreme heterogeneity of PDS in terms of content, structure, application dependence, storage, media, ownership, access rights, and organizational relevance, among others. ii) The need for a PDS to be absolutely discrete, i.e., disjunctive in a mathematical sense, or pairwise discrete, since, if the same stretch of information is covered by two different PDS, this may cause a policy conflict which cannot be solved except through human intervention. iii) The order of magnitude involved, since the number of information objects that policies may need to be applied to can be estimated to be in an order of magnitude of 106 objects per employee.
An example for a policy data source could be “all pre-sales information on product XYZ” as defined within a company designing, manufacturing, marketing and selling product XYZ. Apparently, this particular policy data source, which we may designate as XYZ-PDS, will include material from several company divisions. The need for defining the data source XYZ-PDS arises immediately when, due to a design failure of product XYZ, legal control and eventually information change requests need to be executed over some parts of the XYZ-PDS.
At present, two solutions for solving the task of PDS list creation are known, which are currently being applied in companies having or working on an Information Technology (IT) system for the automatic application of policies:
Manual collection of policy data sources using office tools, meta-data obtained from crawling IT repositories, and interviews with employees responsible for collections. Sometimes, email-based tools for conducting these interviews are being used.
Data warehouse type of querying indexed meta-data collections obtained from crawling IT repositories, where the results of such queries are persisted with database means, e.g., as views or materialized query tables.
Both solutions are referred to in an “Information Governance Benchmark Report in Global 1000 Companies”, issued by the CGOS's Council, to be found in the Internet under www.cgoc.com. The aforementioned solutions do not scale to a magnitude of billions of information objects and ten thousands of data sources, and cannot guarantee, with reasonable effort, the discreteness of a PDS. A further difficulty concerning the use of a policy data source is caused by the aforementioned heterogeneities, and consists in selecting the appropriate criteria for a grouping/sorting of said data.
An effective method of PDS list creation is necessary for a company to have the relevant data at hand at once, e.g., in the case of an ediscovery or in order to define appropriate policies with respect to, e.g., data retention, legal hold or defensible disposal.
A Policy Data Source and/or Policy Target—both being business objects—may be a target of information lifecycle governance. One challenge is how to define appropriate targets for policies on an enterprise scale.
Various prior art software systems provide tools to manage policies. They do not address, however, the question of how to obtain these policies in an effective way given an IT infrastructure and which IT objects belong to a given policy target group.
Other prior art software systems for information lifecycle governance tool are able to: collect meta-data about various IT objects stored on the enterprise IT system and build an index; offer datamarts (“infosets”) on the information objects, which infosets are not Policy Targets as they are not discrete in all instances. This approach does not scale well enough to be applied in big enterprises. A further limitation of these software systems is that they expect the criteria for defining infosets to be known and/or given in advance.