1. Field of the Invention
The present invention relates to a computer system, and deals more particularly with a method, system, and computer program product for enforcing data policy using style sheet processing.
2. Description of the Related Art
xe2x80x9cData policyxe2x80x9d, as used herein, refers to the procedures and rules used to control access to stored data. Prior to the advent of distributed network computing, data policy was something left to the data source to enforce, and often was limited to a simple access control check based on the identity of the requester. (As used herein, xe2x80x9cdata sourcexe2x80x9d refers to an application program executing at an application server from which the data is available, servicing user requests for the stored data.) With the move toward highly distributed networks of applications, devices, and users, this simplistic model for access control is no longer acceptable. This is due to the fact that as applications have become more decentralized, it often becomes unclear what exactly may be the source of particular data. For example, it may be possible that the data was originally obtained by gathering portions thereof from a variety of disparate sources. In this case, what appears to be the data source may be simply the data gatherer instead of the data creator. Furthermore, this data may go through some form of transformation, and because of this, the perceived source of the transformed data may be the data transformer instead of the true data source. In a similar fashion, the true target for the data may be unclear as the data flows through intermediate points (such as gateways) in the network. Because of these factors and the complexity they add, the need to enforce usage policies using more sophisticated techniques than simple access control has become critical.
To illustrate this problem, suppose a target user xe2x80x9cSamxe2x80x9d requests some data such as xe2x80x9ccontact information for Smithxe2x80x9d from a data source such as an employee directory. In this scenario, Sam""s request is sent from his client machine to a server executing an application program which responds to requests for information from the data source (i.e. this employee directory). This application program enforces data policy to decide what, if any, contact information Sam should see about Smith. In the existing art, pertinent factors might be whether Sam has provided a valid password; whether Sam works within the company for which the employee directory is maintained; whether Sam works in a particular department of this company (such as the human resources department) that gives Sam broader access to Smith""s information; etc. In this example, if Sam provides a valid password and is an employee not working in human resources, then one type of filtering process may be applied to Smith""s information (filtering out all personal and salary data, for example) before the result is delivered to Sam; if Sam works in the human resources department, then a different filter (or perhaps no filter) is applied to Smith""s information. Techniques for controlling access in this manner are well known. Sam has provided a valid password; whether Sam works within the company for which the employee directory is maintained; whether Sam works in a particular department of this company (such as the human resources department) that gives Sam broader access to Smith""s information; etc. In this example, if Sam provides a valid password and is an employee not working in human resources, then one type of filtering process may be applied to Smith""s information (filtering out all personal and salary data, for example) before the result is delivered to Sam; if Sam does works in the human resources department, then a different filter (or perhaps no filter) is applied to Smith""s information. Techniques for controlling access in this manner are well known.
A distributed networking computing environment, however, introduces the need for having more sophisticated access policies in place. For example, it becomes necessary to view Sam in light of additional factors such as the device he is using and the location of that device. That is, while the access policy in place may permit Sam to see one set of information regarding Smith on his office computer attached to a secure local area network (LAN), it may be inappropriate for him to see some of the details of this same set of information on his cellular phone screen in a public airport. In this case, the target context (e.g. Sam is using a cell phone, and is connected using a cellular network) may be needed for correct policy enforcement. Pertinent factors in a target context include the user""s identification, device type, network connection type, and any application-specific limitations of the application being executed. This target context may not be available to the server application handling requests for information from the employee directory (the true source of the data), and thus the server application is unable to enforce data policy correctly based on the requester""s target context. For example, if the data for Smith is voluminous and the server application is unable to detect (as is highly likely) that Sam is using a cell phone with a relatively expensive wireless network connection, then this large amount of data will be transmitted to Sam in an expensive, time-consuming transmissionxe2x80x94even though he will likely give up trying to view it because of the inherent display limitations of his end-user device.
While sufficient target context information for enforcing data policy is typically not available to a server application, in today""s distributed environments this target context is often known to at least some portion of the distributed network such as the gateway into a wireless or wired network and other intermediaries (such as transcoding proxies or transcoding Web servers) in a complex delivery chain between the client and the server. Modifying these intermediaries to forward the target context to the server applications so that the server applications can enforce the data policy is not a viable solution in a distributed networking environment, however, as will now be discussed.
To further illustrate the problems of enforcing data policy in a distributed environment, suppose Sam is not an employee of the company to which the employee directory pertains, but is merely an Internet Web user accessing this directory through the company""s Web site. Data policy based upon classifications of users, such as employee vs. non-employee, are common. While it may be appropriate to provide Sam with an external telephone number or e-mail address of an employee to facilitate communications, other information stored in the directory (such as the employee""s department title or office location) may be inappropriate for providing to non-employees. Or, it may be desirable to restrict the volume of data provided to non-employees, for example to prevent advertisers from sending electronic mass mailings to the employees (by obtaining large numbers of e-mail addresses) or to prevent employment agencies from extracting a large portion of the company""s stored phone book information. The true data source of this company""s employee directory information is likely to be multiple data sourcesxe2x80x94that is, a collection of geographically dispersed directory servers in various divisions of the company, each having only a subset of the complete company-wide employee directory. For requests from users such as Sam who are interacting with the company""s Web site, the Web application servicing these requests is then merely an information gatherer. Moreover, it is possible that these distributed directory servers may have different implementations whereby different information is stored for employees; they may use different formats for the data that is stored; and they may have different restrictions regarding the use of the data they contain (i.e. different data policies). For example, the sales division might allow external users such as Sam to see an employee""s job title in order to facilitate customer service, but the manufacturing division may not permit this information to be seen for its employees.
Many other scenarios can be envisaged in which sophisticated data policies which account for many types of variable factors are necessary. For example, a company which supplies products may have multiple pricing structures, whereby other companies buying these products have various discounts applied based on their purchasing volume. In this situation, the data policy must use the relevant factors to apply the correct price for each purchaser""s order.
When an application server servicing user requests has to gather data from diverse sources distributed throughout the network, and then assemble a single response to a request, the problems of enforcing data policy are compounded. The various (true) sources of the data may not have enough information to perform some types of policy enforcement. For example, if a data policy states that at most ten e-mail addresses or ten telephone numbers may be retrieved (in order to protect a company""s resources, as in the scenario described above), each individual server has no way of knowing how may addresses or numbers it may release when it has no knowledge of how many addresses or numbers other servers are releasing.
Based on these factors, it would be advantageous to be able to apply data policy at an intermediate point in the delivery chain. As stated above, certain intermediaries have access to relevant target context. Examples of this type of intermediary are Web servers, transcoders, proxies, and gateways. In addition, because any necessary data gathering from multiple sources has already occurred before the data reaches the intermediary, an intermediary has the complete set of information to which the policy should be applied. However, in the current art, data policy is typically represented in application programming code that is specific to each type of data. It would be a monumental task to distribute this application-specific code to intermediaries, and execute and maintain the code in this dispersed fashion.
Accordingly, what is needed is a technique with which data policy can be efficiently enforced in a complex distributed network computing environment, incorporating many complex factors such as those described above.
An object of the present invention is to provide a technique for enforcing data policy efficiently in a complex distributed networking environment.
Another object of the present invention is to provide this technique whereby data policy is enforced at an intermediate point in the delivery chain from a server application to a client.
Yet another object of the present invention is to provide this technique by applying style sheets to documents encoded in tag languages such as the Extensible Markup Language.
A further object of the present invention is to provide this technique whereby a different data policy may be applied to each different tagged data item to provide maximum flexibility.
Still another object of the present invention is to provide this technique in a backward-compatible manner, such that existing style sheets continue to function properly.
Other objects and advantages of the present invention will be set forth in part in the description and in the drawings which follow and, in part, will be obvious from the description or may be learned by practice of the invention.
To achieve the foregoing objects, and in accordance with the purpose of the invention as broadly described herein, the present invention provides a method, system, and computer program product for use in a computing environment having a connection to a network, for efficiently enforcing data policy using style sheets. This technique comprises: providing an input document representing a response to a user request; providing a Document Type Definition (DTD) corresponding to the input document, wherein the DTD has been augmented with one or more references to one or more stored policy enforcement objects, wherein each of the stored policy enforcement objects enforces a data policy for an element of the input document; and executing an instrumented style sheet processor, wherein this execution further comprises: loading the augmented DTD; resolving each of the one or more references in the loaded DTD; instantiating the policy enforcement objects associated with the resolved references; and executing selected ones of the instantiated policy enforcement objects during application of one or more style sheets to the input document, wherein a result of this execution is an output document reflecting the execution.
Executing the instrumented processor may further comprise generating an output DTD corresponding to the output document.
The input document as well as the output document may be specified in an Extensible Markup Language (XML) notation.
The stored policy enforcement objects may further comprise code for overriding a method for evaluating the element of the input document, and executing selected ones of the instantiated policy enforcement objects may further comprise executing an overridden method.
The style sheets may be specified in an Extensible Stylesheet Language (XSL) notation. The method may be a value-of method of the XSL notation, and overriding the value-of method may be done by subclassing this value-of method. The overridden method may return an input value of the element of the input document or a changed version of the input value, or may return a null value.
Executing selected ones of the instantiated policy enforcement objects may further comprise may further comprise considering a target context of a user making said user request, or may further comprise determining whether an output DTD element in the output DTD will be generated for the element of the input document (where this determination then may further comprise considering a target context of a user making said user request). The determination may further comprise suppressing the output DTD element in the output DTD when the output DTD element is not to be generated.