A document generally has a structure associated with it which defines a layout for and visual characteristics of content items of the document. The structure can be explicitly defined and recorded using a machine readable language such as a markup language for example. Alternatively the structure may be implicit, or only partially explicit. That is to say, none, or only a portion of the structure of the content of the document is explicitly defined and recorded. The different visual cues present in a document—such as spatial intervals and positions, contrast in font families, sizes and weights—combine to form the document's visual hierarchy. This hierarchy is essential to the reader, allowing scanning and comprehension; in contrast, this information is often ignored by machine processing.
A document may be repurposed—that is, the layout and/or certain of the characteristics of the content items constituting the document may be altered (in order to provide a different look for example, or to tailor the document to a specific audience or for a particular use). Automatic repurposing is relatively straightforward when the content items the document comprises of have a well defined and explicitly recorded structure—in such cases automatic repurposing can occur using the recorded structure and the document content without intervention. However, for documents in which there exists no, or only a partial definition of the structure of content items from which the document is composed, automatic repurposing is a more difficult task which has not been addressed since it is not generally possible for a computer implemented system to correctly and consistently determine the elements which make up the document, the way in which they should be repurposed, and to maintain a visual significance or hierarchy between original and repurposed elements which correctly reflects the relative importance associated with those elements in the document.