Many image processing applications involve performing complex operations on multi-page documents. Examples include color correction, color palette design, proofing, image enhancement, etc. Derivation of the parameters of these imaging operations often involves analyzing the document content. When the documents are large, this can involve substantial cost by way of computer processing, storage, memory, or user operations.
In such cases, a large multi-page document undergoing subsequent complex image processing operations must first be properly characterized. This often entails selecting a small subset of representative pages from the multi-page document according to a set of features of interest. The complex imaging operations to be performed on the entire multi-page document based on the content from the selected pages are then optimized. The optimized imaging operation is then applied to the multi-page document.
The task of selecting the subset of representative pages is often done manually by a human operator who examines the large multi-page document in advance of performing one or more complex imaging operations and selects representative pages to derive and optimize the subsequent operations. The time devoted to this process can be significant in instances in document reproduction operations wherein the multi-page document can be relatively large. In many cases, the time and cost elements associated with this task can be prohibitive. If the subset of selected pages is not representative of the entire multi-page document then the optimized imaging operations may not be accurate and the finished results may be of insufficient quality. This may require, after the subsequent imaging operation is performed on the large multi-page document, that the entire selection process begin again. This can be prohibitive in imaging operations requiring more automated workflows.
Techniques found in the arts interpret document pages as nodes of a graph and subsequently apply graph partitioning techniques. Methods for graph partitioning are usually suited for semantic, high-level features where it may be difficult to attach a correspondence between features from different pages. Others use clustering techniques to group documents on a network or database. But, such techniques are directed primarily to the textual content of the document and not to other aspects or characteristics of the information associated with the document, particularly printing characteristics.
What is needed is a method for automatically selecting a small number of representative pages from a large multi-page document which accurately characterizes the overall document such that subsequent image processing operations to be performed can be properly optimized.