This invention relates to processing a directed graph of objects.
Data is often represented as a directed graph of objects. For instance, many formats for documents represent the data in the document as objects that are related to each other by a directed graph. Examples of document formats implemented as directed graphs include Portable Document format (xe2x80x9cPDFxe2x80x9d) developed by Adobe Systems Incorporated of San Jose, Calif. (xe2x80x9cAdobexe2x80x9d) and the document object model specification (xe2x80x9cDOMxe2x80x9d) promulgated by the World Wide Web Consortium (xe2x80x9cW3Cxe2x80x9d) for representing hypertext markup language documents. The documents may be displayed by a document reader, such as Adobe Acrobat(copyright) Reader(trademark), available from Adobe, which displays information contained within the directed graph of objects as prose.
It is sometimes necessary to process the objects in a directed graph. For example, in order to translate a PDF document, it may be necessary to translate objects in a directed graph that represent a document.
The invention relates to a method of identifying the type of an object that has a set of properties. The type of object is associated with a representation of a document as a directed graph of objects. In one general aspect of the invention, the method includes: receiving a first set of properties characteristic of a type of object, the first set of properties being divided into a first subset of sufficient properties and a first subset of additional properties; and determining whether the object is of the first type based only on whether the properties in the first subset of sufficient properties are matched by the properties of the object.
By identifyig the type of the object based only on a subset of the properties that are designated as sufficient, the method can identify objects which may have some of their properties set incorrectly or may not have all of the properties associated with an object type.
In a second general aspect of the invention, a computer program product for identifying the type of an object according to the first aspect of the invention is tangibly stored on a computer-readable medium.
Embodiments of the invention may include one or more of the following features. Values associated with at least some of the properties in the first subset are received and the determination of whether the object is of the first type is further based on whether the values in the first subset match the values of the corresponding properties of the object. The first subset of properties includes a bonus group of properties. The properties in the bonus group are compared with the properties of the object. If at least one property in the bonus group is matched by the properties of the object then the object is identified as the first type. By basing the identification on only one matching value, the method can identify the type of an object that may have incorrect values for certain properties or may have certain properties missing.
The first subset of sufficient properties also includes a lumped group of properties. The values of the properties in the lumped group are compared with properties of the object. If the value of each property in the lumped group is matched by the properties of the object then the object is identified as the first type. The lumped group allows the person designing the set of properties to require that multiple properties of the object be correctly set for the object to be identified as the first type.
The first subset of sufficient properties includes a property of the parent of the object, such as an object type of the parent. Basing the identification on the parent type of the object, for example, makes the identification dependent on the position of the object within a directed graph of objects.
A second set of properties characteristic of a second type of object are received. The second set of properties is divided into a second subset of sufficient properties and a second subset of additional properties. It is determined whether the object is of the second type based only on whether the properties in the second subset of sufficient properties are matched by the properties of the object. The properties in the first set of properties are compared with the properties of the object to determine how many of the properties in the first set match corresponding properties of the object. Then properties in the second set of properties are compared with the properties of the object to determine how many of the properties in the second set are matched by the object. It is determined whether the first type or the second type is a preferred type of the object based on whether the object matches more of the properties in the first set of properties than in the second set of properties. Thus, the method selects the type that better matches the object.
A third general aspect of the invention relates to a computer program product that provides a framework for processing an electronic document. The product includes instructions that cause a programmable processor to perform the following operations. The processor receives an input identifying an electronic document having content organized as a directed graph of objects, with each object having a set of properties, each of which is assigned a value. The processor receives associating information associating one or more requested object types with a first computer program code module or agent. The processor traverses the directed graph, visiting objects in the electronic document. The processor identifies the type of each visited object to determine whether the visited object is of one of the requested object types. If the visited object is of one of the requested types, the processor invokes the first agent to process the visited object. If there is more than one path to an object in the directed graph, an object may be visited more than once in a traversal of the document and the instructions cause the processor to determine whether or not to invoke the agent on more than one of the visits based on the associating information.
Embodiments of the third aspect of the invention may include one or more of the following steps. The instructions cause the processor to identify the type of each visited object according to the method of the first aspect of the invention. The processor records an identity and a type of each visited object and the associating information directs the processor not to invoke the agent on a visit to an object whose identity has already been recorded. Alternatively, the associating information may direct the processor to only invoke the agent on a visit to an object whose identity has already been recorded if the previously recorded type of the object is different from the identified type of the object. By revisiting an object whose inferred type has changed, the framework may be used to detect potentially dangerous objects that are embedded in a document and masqueraded as the previously recorded type.
The instructions further cause the processor to operate as follows. The processor receives information associating one or more object types with a second computer code program code module or agent. The processor determines whether a visited object is associated with the second agent based on the identified type of the visited object. If the visited object is of the requested type and is also associated with the second agent, the processor invokes the second agent in the same visit to the object that the first agent is invoked.
The invention can be implemented to achieve one or more of the following advantages. The method can identify the type of objects that may be not have all the required properties set. The method identifies potentially harmful objects that are masqueraded as one type but are actually of a second different type. It also allows the agents to revisit objects that have multiple classification types, thereby allowing agents to process all the objects that they are associated with. By calling different agents on the same visit to an object, the framework reduces the resources required to process the document. Thus the invention can be used to rid documents of potential harmful objects and to correct errors in the objects contained in the document in an efficient way.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.