Various communication paradigms exist where the entities using the paradigms have agreed upon the constructs of the communication languages. Some examples include EDI (electronic data interchange), http, html, and the like. Some entities add additional constructs to the language over time. Standards bodies meet to decide whether the additional constructs of the language should be adopted into the adopted language versions promoted by the standards bodies. In this way, the languages evolve.
Despite this ability to evolve, general communication languages fail to meet the needs of entities that need to grant specific rights to and/or check the grant of these rights. The languages used need to be flexible enough to grant rights in various circumstances yet maintain the robustness of acceptance from the standards bodies. To this end, authorization policy languages have developed to meet this need for the granting and sharing of rights.
FIG. 1A illustrates a conventional mechanism for granting rights to access a resource using an authorization policy language. A trusted issuer 100 issues a license 102 to a principal 104. License 102 allows principal 104 to use a resource 106. Resource 106 may be a digital work in the form of an image, an audio or video file, or an e-book. The license 102 identifies the principal 104, the resource 106, the right granted and any conditions.
One of the compelling reasons why authorization policy languages are needed is to provide issuing entities with the ability to routinely grant rights to consumers in a consistent manner. As each grant is effectively unique, the issuer needs to be able to consistently check and grant access to customers having accepted the issued rights. Various rights languages (for example, XrML (extensible rights markup language)) permit flexibility in the construction of the actual grant (for example, the order of the information in the grant from one issuer to another may differ or the internal format of the document may differ). This ability of one issuer to unknowingly differ from the construct provided by another issuer for a similar grant creates uncertainty for entities that need to check the grants when presented.
For example, if one purchases a right from music source “A” to listen to music from any of music sources “A,” “B,” or “C,” one would expect that the purchaser would be able to enjoy the purchased right (here, listening to music from any of sources A, B, or C). However, if the structure of A's grant of rights is not identical to the expected structure of a grant from either music sources B or C, one runs the risk that B or C may not respect the grant from A (namely, listening to music from A, B, or C) purchased by the purchaser. Here, B and C would need to compare the grant from A against a grant they expect to permit access to their music. In highly structured environments (for example, where each of A, B, and C share the same infrastructure for granting rights), this risk is minimized. However, in dynamically-definable languages (where the language may evolve, grow, and otherwise be extended over time) such as the extensible markup language (XML) and the extensible rights markup language (XrML), the possibility of a first issuer providing a grant that affects a second issuer where the second user needs to compare the XML or XrML information for equality (that is, to see if they logically represent the same semantic information) increases.
An example of the degree of exactness required follows. Here, two elements are represented:<foo xmlns=“http://afirstsite.org/ns” someAttribute=“someValue”/>and<pre:foo xmlns:pre=“http://afirstsite.org/ns”/>.
These two elements should or should not compare as equal depending on whether the schema for the namespace indicated provides a default for the ‘someAttribute’ attribute of the ‘foo’ element with contents ‘someValue’.
If the XML schemas associated with the XML data are known to or complied into the application (for example, the applications at B and C) attempting the comparison, then the comparison should be straightforward one, as all the requisite and important semantics may be assumed to be known to the application (here, the applications at B and C).
However, if the XML schemas are not well integrated into the application (for example, shipped with the application), but are provided at a later time (for example, during runtime as part of the context in which the comparison is to be carried out or defined in an application extension), then the task of comparing the two sets of information is more difficult. One reason for the difficulty is the priority or significance of the information in each in set needs to be dynamically determined.
A process by which one can attempt to compare the two sets of information makes use of a significant sub-process which may be referred to as canonicalization. The canonicalization sub-process regularlizes the representation of each set of information to which it is applied (or ‘canonicalizes’ it). Two sets of information can then be compared by canonicalizing each, then simply comparing the resultant representations to see if they are bit-for-bit identical.
This is illustrated in FIG. 1B. It is desired that two sets of information, 110 and 120, be compared for equality. An appropriate canonicalization algorithm is chosen, and said algorithm applied to each, resulting in, respectively, representations 111 and 121. These representations are then directly compared to see whether they are identical. In this example, they are; thus, information sets 110 and 120 are considered equal.
In general there are a variety of possible algorithms for carrying out the canonicalization process, some of them existing as prior art in the literature. However, not canonicalization algorithms are equally suitable to being used as a sub-process of XML equality comparison, for some fail to consider some important aspects or properties of the sets of information as being appropriate for regularization. Indeed, none of the prior art algorithms are fully suitable, as each fails to consider one or more crucial aspects of the information sets. A need thus exists for use in XML equality comparison of a canonicalization algorithm which fully and completely regularizes exactly the appropriate set of aspects or properties of the information sets as are considered semantically significant according to the relevant XML standards specifications.
However, even when such a suitable canonicalization algorithm exists, a need further exists for an efficient implementation of XML equality comparison. Executing canonicalization algorithms tends to be expensive, and should be avoided unless absolutely necessary. Thus, a need also exists for a second algorithm which, when it can do so quickly, carries out XML equality comparison without resorting to the use of canonicalization (for example: if two information sets are already identical, they are necessarily equal, and no canonicalization is needed) yet when it cannot do so quickly yields its task to the full canonicalize-and-compare approach.